This is the official pytorch implementation of our ICCV 2023 paper "Neural Interactive Keypoint Detection."
Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang
Keywords: 👯 Multi-person 2D pose estimation, 💃 Human-in-the-loop, 🤝Interactive model
- Click-Pose has been supported in our DeepDataSpace platform. See details for How to perform intelligent labeling with DDS here.
- All models for COCO, Human-Art, OCHuman, and CrowdPose are released!
- Work flow: 🤖 Model localizes all keypoints -> 👨 User corrects a few wrong keypoints -> 🤖 Model refines other keypoints
- 👇 We first propose an interactive keypoint detection task for efficient keypoint annotation.
- 👇 We present the first neural interactive keypoint detection framework, Click-Pose, an end-to-end baseline to annotate multi-person 2D keypoints given an image.
- 👇 Click-Pose is more than 10 times faster than manual annotation. Importantly, it significantly alleviates model bias in out-of-domain annotation (e.g., on Human-Art), reducing the time required by 83% compared to state-of-the-art model annotation (ViTPose) with manual correction.
Model | Backbone | Lr schd | mAP | AP50 | AP75 | APM | APL | Time (ms) | Model |
---|---|---|---|---|---|---|---|---|---|
ED-Pose | ResNet-50 | 60e | 71.7 | 89.7 | 78.8 | 66.2 | 79.7 | 51 | GitHub, Model |
Click-Pose | ResNet-50 | 40e | 73.0 | 90.4 | 80.0 | 68.1 | 80.5 | 48 | Google Drive |
Model | Backbone | mAP | APM | APL | Model |
---|---|---|---|---|---|
ED-Pose | ResNet-50 | 37.5 | 7.6 | 41.1 | GitHub, Model |
Click-Pose | ResNet-50 | 40.5 | 8.3 | 44.2 | Google Drive |
Model | Backbone | mAP | AP50 | AP75 | Model |
---|---|---|---|---|---|
ED-Pose | ResNet-50 | 31.4 | 39.5 | 35.1 | GitHub, Model |
Click-Pose | ResNet-50 | 33.9 | 43.4 | 37.5 | Google Drive |
Note that the model is trained on COCO train2017 set and tested on COCO val2017 set, Human-Art val set, and OCHuman test set.
Model | Backbone | NoC@85 | NoC@90 | NoC@95 | Model |
---|---|---|---|---|---|
ViTPose | ViT-Huge | 1.46 | 2.15 | 2.87 | GitHub, Model |
Click-Pose | ResNet-50 | 0.95 | 1.48 | 1.97 | Google Drive |
Model | Backbone | NoC@85 | NoC@90 | NoC@95 | Model |
---|---|---|---|---|---|
ViTPose | ViT-Huge | 9.12 | 9.79 | 10.13 | GitHub, Model |
Click-Pose | ResNet-50 | 4.82 | 5.81 | 6.45 | Google Drive |
Installation
We use the ED-Pose as our codebase. We test our models under python=3.7.3,pytorch=1.9.0,cuda=11.1
. Other versions might be available as well.
- Clone this repo
git clone https://github.com/IDEA-Research/Click-Pose.git
cd Click-Pose
- Install Pytorch and torchvision
Follow the instruction on https://pytorch.org/get-started/locally/.
# an example:
conda install -c pytorch pytorch torchvision
- Install other needed packages
pip install -r requirements.txt
- Compiling CUDA operators
cd models/clickpose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..
Data Preparation
For COCO data, please download from COCO download. The coco_dir should look like this:
|-- Click-Pose
`-- |-- coco_dir
`-- |-- annotations
| |-- person_keypoints_train2017.json
| `-- person_keypoints_val2017.json
`-- images
|-- train2017
| |-- 000000000009.jpg
| |-- 000000000025.jpg
| |-- 000000000030.jpg
| |-- ...
`-- val2017
|-- 000000000139.jpg
|-- 000000000285.jpg
|-- 000000000632.jpg
|-- ...
For Human-Art data, please download from Human-Art download, The humanart_dir should look like this:
|-- Click-Pose
`-- |-- humanart_dir
`-- |-- annotations
| |-- training_humanart.json
| |-- validation_humanart.json
`-- images
|-- 2D_virtual_human
|-- ...
|-- 3D_virtual_human
|-- ...
|-- real_human
|-- ...
For CrowdPose data, please download from CrowdPose download, The crowdpose_dir should look like this:
|-- Click-Pose
`-- |-- crowdpose_dir
`-- |-- json
| |-- crowdpose_train.json
| |-- crowdpose_val.json
| |-- crowdpose_trainval.json (generated by util/crowdpose_concat_train_val.py)
| `-- crowdpose_test.json
`-- images
|-- 100000.jpg
|-- 100001.jpg
|-- 100002.jpg
|-- 100003.jpg
|-- 100004.jpg
|-- 100005.jpg
|-- ...
For OCHuman data, please download from OCHuman download. The ochuman_dir should look like this:
|-- Click-Pose
`-- |-- ochuman_dir
`-- |-- annotations
`-- images
Model-Only
export CLICKPOSE_COCO_PATH=/path/to/your/coco_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/ClickPose_Model-Only" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=FLASE feedback_loop_NOC_test=FALSE feedback_inference=FALSE only_correction=FALSE \
--dataset_file="coco"
Neural Interactive
export CLICKPOSE_COCO_PATH=/path/to/your/coco_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/ClickPose_Neural_Interactive" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=FALSE feedback_inference=FALSE only_correction=FALSE \
--dataset_file="coco"
Model-Only
export CLICKPOSE_COCO_PATH=/path/to/your/coco_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/ClickPose_Model-Only_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=FLASE feedback_loop_NOC_test=FALSE feedback_inference=FALSE only_correction=FALSE \
--dataset_file="coco" \
--pretrain_model_path "./models/ClickPose_model_only_R50.pth" \
--eval
Neural Interactive-NoC metric
export CLICKPOSE_COCO_PATH=/path/to/your/coco_dir
export CLICKPOSE_NoC_Test="TRUE"
export CLICKPOSE_SAVE_PATH = "./NoC_95_coco.json"
export NoC_thr = 0.95
python -m torch.distributed.launch --nproc_per_node=1 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=1 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=TRUE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="coco" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
Neural Interactive-AP metric
export CLICKPOSE_COCO_PATH=/path/to/your/coco_dir
export CLICKPOSE_NoC_Test="TRUE"
for CLICKPOSE_Click_Number in {1..17}
do
python -m torch.distributed.launch --nproc_per_node=4 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=FALSE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="coco" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
done
Model-Only
export CLICKPOSE_HumanArt_PATH=/path/to/your/humanart_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/ClickPose_Model-Only_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=FLASE feedback_loop_NOC_test=FALSE feedback_inference=FALSE only_correction=FALSE \
--dataset_file="humanart" \
--pretrain_model_path "./models/ClickPose_model_only_R50.pth" \
--eval
Neural Interactive-NoC metric
export CLICKPOSE_HumanArt_PATH=/path/to/your/humanart_dir
export CLICKPOSE_NoC_Test="TRUE"
export CLICKPOSE_SAVE_PATH = "./NoC_95_humanart.json"
export NoC_thr = 0.95
python -m torch.distributed.launch --nproc_per_node=1 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=1 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=TRUE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="humanart" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
Neural Interactive-AP metric
export CLICKPOSE_HumanArt_PATH=/path/to/your/humanart_dir
export CLICKPOSE_NoC_Test="TRUE"
for CLICKPOSE_Click_Number in {1..17}
do
python -m torch.distributed.launch --nproc_per_node=4 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=FALSE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="humanart" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
done
Model-Only
export CLICKPOSE_OCHuman_PATH=/path/to/your/ochuman_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/ClickPose_Model-Only_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=FLASE feedback_loop_NOC_test=FALSE feedback_inference=FALSE only_correction=FALSE \
--dataset_file="ochuman" \
--pretrain_model_path "./models/ClickPose_model_only_R50.pth" \
--eval
Neural Interactive-NoC metric
export CLICKPOSE_OCHuman_PATH=/path/to/your/ochuman_dir
export CLICKPOSE_NoC_Test = "TRUE"
export CLICKPOSE_SAVE_PATH = "./NoC_95_ochuman.json"
export NoC_thr = 0.95
python -m torch.distributed.launch --nproc_per_node=1 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=1 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=TRUE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="ochuman" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
Neural Interactive-AP metric
export CLICKPOSE_OCHuman_PATH=/path/to/your/ochuman_dir
export CLICKPOSE_NoC_Test="TRUE"
for CLICKPOSE_Click_Number in {1..17}
do
python -m torch.distributed.launch --nproc_per_node=4 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=FALSE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="ochuman" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
done
If you find this repository useful for your work, please consider citing it as follows:
@inproceedings{yang2023neural,
title={Neural Interactive Keypoint Detection},
author={Yang, Jie and Zeng, Ailing and Li, Feng and Liu, Shilong and Zhang, Ruimao and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={15122--15132},
year={2023}
}
@inproceedings{yang2022explicit,
title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
author={Yang, Jie and Zeng, Ailing and Liu, Shilong and Li, Feng and Zhang, Ruimao and Zhang, Lei},
booktitle={The Eleventh International Conference on Learning Representations},
year={2022}
}