Skip to content

Latest commit

 

History

History
3394 lines (2065 loc) · 109 KB

CVPR2021-Papers-with-Code.md

File metadata and controls

3394 lines (2065 loc) · 109 KB

CVPR 2021 论文和开源项目合集(Papers with Code)

CVPR 2021 论文和开源项目合集(papers with code)!

CVPR 2021 收录列表:http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt

注1:欢迎各位大佬提交issue,分享CVPR 2021论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

【CVPR 2021 论文开源目录】

Best Paper

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Backbone

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

BCNet: Searching for Network Width with Bilaterally Coupled Network

Decoupled Dynamic Filter Networks

Lite-HRNet: A Lightweight High-Resolution Network

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Scaling Local Self-Attention For Parameter Efficient Visual Backbones

ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network

Involution: Inverting the Inherence of Convolution for Visual Recognition

Coordinate Attention for Efficient Mobile Network Design

Inception Convolution with Efficient Dilation Search

RepVGG: Making VGG-style ConvNets Great Again

NAS

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

BCNet: Searching for Network Width with Bilaterally Coupled Network

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

  • Paper: ttps://arxiv.org/abs/2105.10154
  • Code: None

Combined Depth Space based Architecture Search For Person Re-identification

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

Neural Architecture Search with Random Labels

Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

Prioritized Architecture Sampling with Monto-Carlo Tree Search

Contrastive Neural Architecture Search with Neural Architecture Comparators

AttentiveNAS: Improving Neural Architecture Search via Attentive

ReNAS: Relativistic Evaluation of Neural Architecture Search

HourNAS: Extremely Fast Neural Architecture

Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

Inception Convolution with Efficient Dilation Search

GAN

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

Regularizing Generative Adversarial Networks under Limited Data

Towards Real-World Blind Face Restoration with Generative Facial Prior

TediGAN: Text-Guided Diverse Image Generation and Manipulation

Generative Hierarchical Features from Synthesizing Image

Teachers Do More Than Teach: Compressing Image-to-Image Models

HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms

pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network

Diverse Semantic Image Synthesis via Probability Distribution Modeling

LOHO: Latent Optimization of Hairstyles via Orthogonalization

PISE: Person Image Synthesis and Editing with Decoupled GAN

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

Efficient Conditional GAN Transfer with Knowledge Propagation across Classes

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: None
  • Code: None

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

A 3D GAN for Improved Large-pose Facial Recognition

HumanGAN: A Generative Model of Humans Images

ID-Unet: Iterative Soft and Hard Deformation for View Synthesis

CoMoGAN: continuous model-guided image-to-image translation

Training Generative Adversarial Networks in One Stage

Closed-Form Factorization of Latent Semantics in GANs

Anycost GANs for Interactive Image Synthesis and Editing

Image-to-image Translation via Hierarchical Style Disentanglement

VAE

Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders

Visual Transformer

1. End-to-End Human Pose and Mesh Reconstruction with Transformers

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain

4. HOTR: End-to-End Human-Object Interaction Detection with Transformers

5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

6. Pose Recognition with Cascade Transformers

7. Variational Transformer Networks for Layout Generation

8. LoFTR: Detector-Free Local Feature Matching with Transformers

9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

11. Transformer Tracking

12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

13. MIST: Multiple Instance Spatial Transformer

14. Multimodal Motion Prediction with Stacked Transformers

15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

17. Pre-Trained Image Processing Transformer

18. End-to-End Video Instance Segmentation with Transformers

19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

20. End-to-End Human Object Interaction Detection with HOI Transformer

21. Transformer Interpretability Beyond Attention Visualization

22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

  • Paper: None
  • Code: None

23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

  • Paper: None
  • Code: None

24. Line Segment Detection Using Transformers without Edges

25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

27. Facial Action Unit Detection With Transformers

  • Paper: None
  • Code: None

28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition

  • Paper: None
  • Code: None

29. Lesion-Aware Transformers for Diabetic Retinopathy Grading

  • Paper: None
  • Code: None

30. Topological Planning With Transformers for Vision-and-Language Navigation

31. Adaptive Image Transformer for One-Shot Object Detection

  • Paper: None
  • Code: None

32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

  • Paper: None
  • Code: None

33. Taming Transformers for High-Resolution Image Synthesis

34. Self-Supervised Video Hashing via Bidirectional Transformers

  • Paper: None
  • Code: None

35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

36. Gaussian Context Transformer

  • Paper: None
  • Code: None

37. General Multi-Label Image Classification With Transformers

38. Bottleneck Transformers for Visual Recognition

39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

41. Self-attention based Text Knowledge Mining for Text Detection

42. SSAN: Separable Self-Attention Network for Video Representation Learning

  • Paper: None
  • Code: None

43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Regularization

Regularizing Neural Networks via Adversarial Model Perturbation

SLAM

Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation

Generalizing to the Open World: Deep Visual Odometry with Online Adaptation

长尾分布(Long-Tailed)

Adversarial Robustness under Long-Tailed Distribution

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition

Adaptive Class Suppression Loss for Long-Tail Object Detection

Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification

数据增广(Data Augmentation)

Scale-aware Automatic Augmentation for Object Detection

无监督/自监督(Un/Self-Supervised)

Domain-Specific Suppression for Adaptive Object Detection

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Self-supervised Video Representation Learning by Context and Motion Decoupling

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

Spatially Consistent Representation Learning

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

Exploring Simple Siamese Representation Learning

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

半监督学习(Semi-Supervised )

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

胶囊网络(Capsule Network)

Capsule Network is Not More Robust than Convolutional Network

图像分类(Image Classification)

Correlated Input-Dependent Label Noise in Large-Scale Image Classification

2D目标检测(Object Detection)

2D目标检测

1. Scaled-YOLOv4: Scaling Cross Stage Partial Network

2. You Only Look One-level Feature

3. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

4. End-to-End Object Detection with Fully Convolutional Network

5. Dynamic Head: Unifying Object Detection Heads with Attentions

6. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

7. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

8. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators

9. Tracking Pedestrian Heads in Dense Crowd

10. Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

11. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

12. IQDet: Instance-wise Quality Distribution Sampling for Object Detection

13. Multi-Scale Aligned Distillation for Low-Resolution Detection

14. Adaptive Class Suppression Loss for Long-Tail Object Detection

15. VarifocalNet: An IoU-aware Dense Object Detector

16. OTA: Optimal Transport Assignment for Object Detection

17. Distilling Object Detectors via Decoupled Features

18. Robust and Accurate Object Detection via Adversarial Learning

19. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

20. Multiple Instance Active Learning for Object Detection

21. Towards Open World Object Detection

22. RankDetNet: Delving Into Ranking Constraints for Object Detection

旋转目标检测

23. Dense Label Encoding for Boundary Discontinuity Free Rotation Detection

24. ReDet: A Rotation-equivariant Detector for Aerial Object Detection

25. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection

Few-Shot目标检测

26. Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss

27. Adaptive Image Transformer for One-Shot Object Detection

28. Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection

29. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection

30. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding

31. Hallucination Improves Few-Shot Object Detection

32. Few-Shot Object Detection via Classification Refinement and Distractor Retreatment

33. Generalized Few-Shot Object Detection Without Forgetting

34. Transformation Invariant Few-Shot Object Detection

35. UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation

36. Beyond Max-Margin: Class Margin Equilibrium for Few-Shot Object Detection

半监督目标检测

37. Points As Queries: Weakly Semi-Supervised Object Detection by Points]

38. Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection

39. Positive-Unlabeled Data Purification in the Wild for Object Detection

40. Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection

41. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

42. Humble Teachers Teach Better Students for Semi-Supervised Object Detection

43. Interpolation-Based Semi-Supervised Learning for Object Detection

域自适应目标检测

44. Domain-Specific Suppression for Adaptive Object Detection

45. MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection

46. Unbiased Mean Teacher for Cross-Domain Object Detection

47. I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors

自监督目标检测

48. There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge

49. Instance Localization for Self-supervised Detection Pretraining

弱监督目标检测

50. Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection

51. DAP: Detection-Aware Pre-training with Weak Supervision

其他

52. Open-Vocabulary Object Detection Using Captions

53. Depth From Camera Motion and Object Detection

54. Unsupervised Object Detection With LIDAR Clues

55. GAIA: A Transfer Learning System of Object Detection That Fits Your Needs

56. General Instance Distillation for Object Detection

57. AQD: Towards Accurate Quantized Object Detection

58. Scale-Aware Automatic Augmentation for Object Detection

59. Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection

60. Class-Aware Robust Adversarial Training for Object Detection

61. Improved Handling of Motion Blur in Online Object Detection

62. Multiple Instance Active Learning for Object Detection

63. Neural Auto-Exposure for High-Dynamic Range Object Detection

64. Generalizable Pedestrian Detection: The Elephant in the Room

65. Neural Auto-Exposure for High-Dynamic Range Object Detection

单/多目标跟踪(Object Tracking)

单目标跟踪

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

Graph Attention Tracking

Rotation Equivariant Siamese Networks for Tracking

Track to Detect and Segment: An Online Multi-Object Tracker

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Transformer Tracking

多目标跟踪

Tracking Pedestrian Heads in Dense Crowd

Multiple Object Tracking with Correlation Learning

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Learning a Proposal Classifier for Multiple Object Tracking

Track to Detect and Segment: An Online Multi-Object Tracker

语义分割(Semantic Segmentation)

1. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation

2. Rethinking BiSeNet For Real-time Semantic Segmentation

3. Progressive Semantic Segmentation

4. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

5. Capturing Omni-Range Context for Omnidirectional Segmentation

6. Learning Statistical Texture for Semantic Segmentation

7. InverseForm: A Loss Function for Structured Boundary-Aware Segmentation

8. DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation

弱监督语义分割

9. Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation

10. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

11. Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

12. Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation

13. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation

半监督语义分割

14. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

15. Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation

16. Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency

17. Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

18. Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation

域自适应语义分割

19. Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation

20. Source-Free Domain Adaptation for Semantic Segmentation

21. Uncertainty Reduction for Model Adaptation in Semantic Segmentation

22. Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation

23. RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

24. Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization

25. MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation

26. Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation

27. Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

28. DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation

Few-Shot语义分割

29. Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation

30. Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation

无监督语义分割

31. PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering

视频语义分割

32. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

其它

33. Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations

34. Exploit Visual Dependency Relations for Semantic Segmentation

35. Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs

36. PLOP: Learning without Forgetting for Continual Semantic Segmentation

37. 3D-to-2D Distillation for Indoor Scene Parsing

38. Bidirectional Projection Network for Cross Dimension Scene Understanding

39. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation

实例分割(Instance Segmentation)

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Incremental Few-Shot Instance Segmentation

A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

Multi-Scale Aligned Distillation for Low-Resolution Detection

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

Zero-shot instance segmentation(Not Sure)

视频实例分割

STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

End-to-End Video Instance Segmentation with Transformers

全景分割(Panoptic Segmentation)

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Part-aware Panoptic Segmentation

Exemplar-Based Open-Set Panoptic Segmentation Network

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

Panoptic Segmentation Forecasting

Fully Convolutional Networks for Panoptic Segmentation

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

医学图像分割

1. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling

2. Every Annotation Counts: Multi-Label Deep Supervision for Medical Image Segmentation

3. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

4. DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

5. DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images

视频目标分割(Video-Object-Segmentation)

Learning Position and Target Consistency for Memory-based Video Object Segmentation

SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

交互式视频目标分割(Interactive-Video-Object-Segmentation)

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

显著性检测(Saliency Detection)

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion

伪装物体检测(Camouflaged Object Detection)

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

协同显著性检测(Co-Salient Object Detection)

Group Collaborative Learning for Co-Salient Object Detection

协同显著性检测(Image Matting)

Semantic Image Matting

行人重识别(Person Re-identification)

Generalizable Person Re-identification with Relevance-aware Mixture of Experts

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Combined Depth Space based Architecture Search For Person Re-identification

行人搜索(Person Search)

Anchor-Free Person Search

视频理解/行为识别(Video Understanding)

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

FrameExit: Conditional Early Exiting for Efficient Video Recognition

No frame left behind: Full Video Action Recognition

Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

ACTION-Net: Multipath Excitation for Action Recognition

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

TDN: Temporal Difference Networks for Efficient Action Recognition

人脸识别(Face Recognition)

A 3D GAN for Improved Large-pose Facial Recognition

MagFace: A Universal Representation for Face Recognition and Quality Assessment

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

人脸检测(Face Detection)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection

CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement

人脸活体检测(Face Anti-Spoofing)

Cross Modal Focal Loss for RGBD Face Anti-Spoofing

Deepfake检测(Deepfake Detection)

Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain

Multi-attentional Deepfake Detection

人脸年龄估计(Age Estimation)

Continuous Face Aging via Self-estimated Residual Age Embedding

PML: Progressive Margin Loss for Long-tailed Age Classification

人脸表情识别(Facial Expression Recognition)

Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition

Deepfakes

MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes

人体解析(Human Parsing)

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

2D/3D人体姿态估计(2D/3D Human Pose Estimation)

2D 人体姿态估计

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

  • Paper: ttps://arxiv.org/abs/2105.10154
  • Code: None

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

Pose Recognition with Cascade Transformers

DCPose: Deep Dual Consecutive Network for Human Pose Estimation

3D 人体姿态估计

End-to-End Human Pose and Mesh Reconstruction with Transformers

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation

Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation

动物姿态估计(Animal Pose Estimation)

From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation

手部姿态估计(Hand Pose Estimation)

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

Human Volumetric Capture

POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture

场景文本检测(Scene Text Detection)

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

场景文本识别(Scene Text Recognition)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

图像压缩

Checkerboard Context Model for Efficient Learned Image Compression

Slimmable Compressive Autoencoders for Practical Neural Image Compression

Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton

模型压缩/剪枝/量化

Teachers Do More Than Teach: Compressing Image-to-Image Models

模型剪枝

Dynamic Slimmable Network

模型量化

Network Quantization with Element-wise Gradient Scaling

Zero-shot Adversarial Quantization

Learnable Companding Quantization for Accurate Low-bit Neural Networks

知识蒸馏(Knowledge Distillation)

Distilling Knowledge via Knowledge Review

Distilling Object Detectors via Decoupled Features

超分辨率(Super-Resolution)

Image Super-Resolution with Non-Local Sparse Attention

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

AdderSR: Towards Energy Efficient Image Super-Resolution

去雾(Dehazing)

Contrastive Learning for Compact Single Image Dehazing

视频超分辨率

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

图像恢复(Image Restoration)

Multi-Stage Progressive Image Restoration

图像补全(Image Inpainting)

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

图像编辑(Image Editing)

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

High-Fidelity and Arbitrary Face Editing

Anycost GANs for Interactive Image Synthesis and Editing

PISE: Person Image Synthesis and Editing with Decoupled GAN

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: None
  • Code: None

图像描述(Image Captioning)

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

字体生成(Font Generation)

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

图像匹配(Image Matcing)

LoFTR: Detector-Free Local Feature Matching with Transformers

Convolutional Hough Matching Networks

图像融合(Image Blending)

Bridging the Visual Gap: Wide-Range Image Blending

反光去除(Reflection Removal)

Robust Reflection Removal with Reflection-free Flash-only Cues

3D点云分类(3D Point Clouds Classification)

Equivariant Point Network for 3D Point Cloud Analysis

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

3D目标检测(3D Object Detection)

3D-MAN: 3D Multi-frame Attention Network for Object Detection

Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

M3DSSD: Monocular 3D Single Stage Object Detector

SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud

Center-based 3D Object Detection and Tracking

Categorical Depth Distribution Network for Monocular 3D Object Detection

3D语义分割(3D Semantic Segmentation)

Bidirectional Projection Network for Cross Dimension Scene Understanding

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

3D全景分割(3D Panoptic Segmentation)

Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation

3D目标跟踪(3D Object Trancking)

Center-based 3D Object Detection and Tracking

3D点云配准(3D Point Cloud Registration)

ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

PREDATOR: Registration of 3D Point Clouds with Low Overlap

3D点云补全(3D Point Cloud Completion)

Unsupervised 3D Shape Completion through GAN Inversion

Variational Relational Point Completion Network

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion

3D重建(3D Reconstruction)

Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection

Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video

6D位姿估计(6D Pose Estimation)

FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

相机姿态估计

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

深度估计(Depth Estimation)

S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

Beyond Image to Depth: Improving Depth Prediction using Echoes

S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation

Depth from Camera Motion and Object Detection

立体匹配(Stereo Matching)

A Decomposition Model for Stereo Matching

光流估计(Flow Estimation)

Self-Supervised Multi-Frame Monocular Scene Flow

RAFT-3D: Scene Flow using Rigid-Motion Embeddings

Learning Optical Flow From Still Images

FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

车道线检测(Lane Detection)

Focus on Local: Detecting Lane Marker from Bottom Up via Key Point

Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

轨迹预测(Trajectory Prediction)

Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction

人群计数(Crowd Counting)

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

对抗样本(Adversarial Examples)

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

LiBRe: A Practical Bayesian Approach to Adversarial Detection

Natural Adversarial Examples

图像检索(Image Retrieval)

StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval

视频检索(Video Retrieval)

On Semantic Similarity in Video Retrieval

跨模态检索(Cross-modal Retrieval)

Cross-Modal Center Loss for 3D Cross-Modal Retrieval

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

Zero-Shot Learning

Counterfactual Zero-Shot and Open-Set Visual Recognition

联邦学习(Federated Learning)

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

视频插帧(Video Frame Interpolation)

CDFI: Compression-Driven Network Design for Frame Interpolation

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation

视觉推理(Visual Reasoning)

Transformation Driven Visual Reasoning

图像合成(Image Synthesis)

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Taming Transformers for High-Resolution Image Synthesis

视图合成(View Synthesis)

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Self-Supervised Visibility Learning for Novel View Synthesis

NeX: Real-time View Synthesis with Neural Basis Expansion

风格迁移(Style Transfer)

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

布局生成(Layout Generation)

LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

  • Paper: None
  • Code: None

Variational Transformer Networks for Layout Generation

Domain Generalization

Generalization on Unseen Domains via Inference-time Label-Preserving Target Projections

Generalizable Person Re-identification with Relevance-aware Mixture of Experts

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

Adaptive Methods for Real-World Domain Generalization

FSDR: Frequency Space Domain Randomization for Domain Generalization

Domain Adaptation

Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation

Domain Consensus Clustering for Universal Domain Adaptation

Open-Set

Towards Open World Object Detection

Exemplar-Based Open-Set Panoptic Segmentation Network

Learning Placeholders for Open-Set Recognition

Adversarial Attack

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

"人-物"交互(HOI)检测

HOTR: End-to-End Human-Object Interaction Detection with Transformers

Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

Reformulating HOI Detection as Adaptive Set Prediction

Detecting Human-Object Interaction via Fabricated Compositional Learning

End-to-End Human Object Interaction Detection with HOI Transformer

阴影去除(Shadow Removal)

Auto-Exposure Fusion for Single-Image Shadow Removal

虚拟换衣(Virtual Try-On)

Parser-Free Virtual Try-on via Distilling Appearance Flows

基于外观流蒸馏的无需人体解析的虚拟换装

标签噪声(Label Noise)

A Second-Order Approach to Learning with Instance-Dependent Label Noise

视频稳像(Video Stabilization)

Real-Time Selfie Video Stabilization

数据集(Datasets)

Tracking Pedestrian Heads in Dense Crowd

Part-aware Panoptic Segmentation

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

论文下载链接:

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Learning To Count Everything

Semantic Image Matting

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

Visual Semantic Role Labeling for Video Understanding

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

Depth from Camera Motion and Object Detection

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

其他(Others)

Fast and Accurate Model Scaling

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos

Omnimatte: Associating Objects and Their Effects in Video

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Motion Representations for Articulated Animation

Deep Lucas-Kanade Homography for Multimodal Image Alignment

Skip-Convolutions for Efficient Video Processing

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

Learning To Count Everything

SOLD2: Self-supervised Occlusion-aware Line Description and Detection

Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression

LEAP: Learning Articulated Occupancy of People

Visual Semantic Role Labeling for Video Understanding

UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction

Towards High Fidelity Face Relighting with Realistic Shadows

BRepNet: A topological message passing system for solid models

Visually Informed Binaural Audio Generation without Binaural Audios

Exploring intermediate representation for monocular vehicle pose estimation

Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB

Invertible Image Signal Processing

Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling

SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences

Embedding Transfer with Label Relaxation for Improved Metric Learning

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

Meta-Mining Discriminative Samples for Kinship Verification

Cloud2Curve: Generation and Vectorization of Parametric Sketches

TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

ACRE: Abstract Causal REasoning Beyond Covariation

Confluent Vessel Trees with Accurate Bifurcations

Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks

Knowledge Evolution in Neural Networks

Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning

SGP: Self-supervised Geometric Perception

Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning

Diffusion Probabilistic Models for 3D Point Cloud Generation

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

待添加(TODO)

不确定中没中(Not Sure)

CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models

Toward Explainable Reflection Removal with Distilling and Model Uncertainty

DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation

Exploring Adversarial Fake Images on Face Manifold

Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task

Temporal Contrastive Graph for Self-supervised Video Representation Learning

Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching

Fast and Memory-Efficient Compact Bilinear Pooling

Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine

Estimating A Child's Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation

https://github.com/ShaoQiangShen/CVPR2021

https://github.com/gillesflash/CVPR2021

https://github.com/anonymous-submission1991/BaLeNAS

https://github.com/cvpr2021dcb/cvpr2021dcb

https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578

https://github.com/AldrichZeng/FreqPrune

https://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM

https://github.com/ddfss/datadrive-fss