Skip to content

Latest commit

 

History

History
353 lines (306 loc) · 49.5 KB

README.md

File metadata and controls

353 lines (306 loc) · 49.5 KB

Awesome Talking Face Awesome

This is a repository for organizing papres, codes and other resources related to talking face/head. Most papers are linked to the pdf address provided by "arXiv" or "OpenAccess". However, some papers require an academic license to browse. For example, IEEE, springer, and elsevier journal, etc.

🔆 This project is still on-going, pull requests are welcomed!!

If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. Just letting me know the title of papers can also be a big contribution to me. You can do this by open issue or contact me directly via email.

⭐ If you find this repo useful, please star it!!!

2022.09 Update!

Thanks for PR from everybody! From now on, I'll occasionally include some papers about video-driven talking face generation. Because I found that the community is trying to include the video-driven methods into the talking face generation scope, though it is originally termed as Face Reenactment.

So, if you are looking for video-driven talking face generation, I would suggest you have a star here, and go to search Face Reenactment, you'll find more :)

One more thing, please correct me if you find that there are any paper noted as arXiv paper has been accepted to some conferences or journals.

2021.11 Update!

I updated a batch of papers that appeared in the past few months. In this repo, I was intend to cover the audio-driven talking face generation works. However, I found several text-based research works are also very interesting. So I included them here. Enjoy it!

TO DO LIST

  • Main paper list
  • Add paper link
  • Add codes if have
  • Add project page if have
  • Datasets and survey

Papers

2D Video - Person independent

2024

  • DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation [arXiv 2024] Paper Code ProjectPage
  • Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization [arXiv 2024] Paper
  • MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting [arXiv 2024] Paper Code
  • 3D-Aware Text-driven Talking Avatar Generation [ECCV 2024] Paper
  • LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details [arXiv 2024] Paper
  • TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans [ECCVW 2024] Paper
  • JoyHallo: Digital human model for Mandarin [arXiv 2024] Paper Code
  • JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation [BMVC 2024] Paper
  • StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads [TPAMI 2024] Paper
  • DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures [CVPRW 2024] Paper
  • EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion [arXiv 2024] Paper
  • SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model [arXiv 2024] Paper
  • SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing [arXiv 2024] Paper
  • Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency [arXiv 2024] Paper ProjectPage
  • PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation [arXiv 2024] Paper
  • CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention [arXiv 2024] Paper ProjectPage
  • TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation [arXiv 2024] Paper
  • S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis [arXiv 2024] Paper
  • FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model [arXiv 2024] Paper
  • LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control [arXiv 2024] Paper ProjectPage Code
  • High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model [arXiv 2024] Paper
  • Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation [arXiv 2024] Paper
  • LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement [arXiv 2024] Paper ProjectPage Code
  • Learning Online Scale Transformation for Talking Head Video Generation [arXiv 2024] Paper
  • EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions [arXiv 2024] Paper ProjectPage GitHub
  • Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation [arXiv 2024] Paper ProjectPage GitHub
  • RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network [arXiv 2024] Paper
  • Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation [arXiv 2024] Paper
  • Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [arXiv 2024] Paper ProjectPage
  • Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement [arXiv 2024] Paper ProjectPage
  • Controllable Talking Face Generation by Implicit Facial Keypoints Editing [arXiv 2024] Paper
  • InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation [arXiv 2024] Paper ProjectPage
  • Faces that Speak: Jointly Synthesising Talking Face and Speech from Text [arXiv 2024] Paper ProjectPage
  • Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation [arXiv 2024] Paper
  • SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space [arXiv 2024] Paper ProjectPage
  • AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding [arXiv 2024] Paper Code ProjectPage
  • NeRFFaceSpeech: One-shot Audio-diven 3D Talking Head Synthesis via Generative Prior [CVPR 2024 Workshop] Paper Code ProjectPage
  • Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation [CVPR 2024 Workshop] Paper
  • EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars [arXiv 2024] PaperProjectPage
  • GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting [arXiv 2024] Paper
  • VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time [arXiv 2024] Paper ProjectPage
  • THQA: A Perceptual Quality Assessment Database for Talking Heads [arXiv 2024] Paper Code
  • Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior [arXiv 2024] Paper Code ProjectPage
  • EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis [arXiv 2024] Paper Code ProjectPage
  • AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations [arXiv 2024] Paper Code
  • MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation [arXiv 2024] Paper ProjectPage
  • Superior and Pragmatic Talking Face Generation with Teacher-Student Framework [arXiv 2024] Paper ProjectPage
  • X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention [arXiv 2024] Paper
  • Adaptive Super Resolution For One-Shot Talking-Head Generation [arXiv 2024] Paper
  • Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style [arXiv 2024] Paper
  • FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization [arXiv 2024] Paper
  • FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio [arXiv 2024] Paper Code
  • Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis [CVPR 2024] Paper Code
  • EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions [arXiv 2024] Paper ProjectPage Code
  • G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment [arXiv 2024] Paper
  • Context-aware Talking Face Video Generation [arXiv 2024] Paper
  • EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation [arXiv 2024] Paper ProjectPage Code
  • GPAvatar: Generalizable and Precise Head Avatar from Image(s) [ICLR 2024] Paper Code
  • Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis [ICLR 2024] Paper
  • EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model [ICASSP 2024] Paper
  • CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer [WACV 2024] Paper Code

2023

  • VectorTalker: SVG Talking Face Generation with Progressive Vectorisation [arXiv 2023] Paper
  • DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models [arXiv 2023] Paper ProjectPage
  • GMTalker: Gaussian Mixture based Emotional talking video Portraits [arXiv 2023] Paper ProjectPage
  • DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers [arXiv 2023] Paper
  • R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning [arXiv 2023] Paper
  • FT2TF: First-Person Statement Text-To-Talking Face Generation [arXiv 2023] Paper
  • VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior [arXiv 2023] Paper Code ProjectPage
  • SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis [arXiv 2023] Paper Code ProjectPage
  • GAIA: Zero-shot Talking Avatar Generation [arXiv 2023] Paper
  • Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis [ICCV 2023] Paper ProjectPage Code
  • Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation [ICCV 2023] Paper ProjectPage Code
  • MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions [ICCV 2023] Paper ProjectPage
  • ToonTalker: Cross-Domain Face Reenactment [ICCV 2023] Paper
  • Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation [ICCV 2023] Paper ProjectPage Code
  • EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation [ICCV 2023] Paper
  • Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation [ICCV 2023] Paper
  • Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions [arXiv 2023] Paper
  • Plug the Leaks: Advancing Audio-driven Talking Face Generation by Preventing Unintended Information Flow [arXiv 2023] Paper
  • Reprogramming Audio-driven Talking Face Synthesis into Text-driven [arXiv 2023] Paper
  • Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis [TCSVT 2023] Paper
  • Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks [arXiv 2023] Paper
  • Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis [arXiv 2023] Paper
  • SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation [CVPR 2023] Paper Code
  • MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation [CVPR 2023] Paper ProjectPage Code
  • Implicit Neural Head Synthesis via Controllable Local Deformation Fields [CVPR 2023] Paper
  • LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook [CVPR 2023] Paper
  • GANHead: Towards Generative Animatable Neural Head Avatars [CVPR 2023] Paper ProjectPage Code
  • Parametric Implicit Face Representation for Audio-Driven Facial Reenactment [CVPR 2023] Paper
  • Identity-Preserving Talking Face Generation with Landmark and Appearance Priors [CVPR 2023] Paper Code
  • StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator [CVPR 2023] Paper ProjectPage Code
  • Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos [arXiv 2023] Paper ProjectPage
  • Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model [arXiv 2023] Paper
  • High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning [CVPR 2023] Paper
  • StyleLipSync: Style-based Personalized Lip-sync Video Generation [arXiv 2023] Paper ProjectPage Code
  • GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation [arXiv 2023] Paper ProjectPage
  • High-Fidelity and Freely Controllable Talking Head Video Generation [CVPR 2023] Paper Project Page
  • One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field [CVPR 2023] Paper ProjectPage
  • Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert [CVPR 2023] Paper Code
  • Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations [arXiv 2023] Paper
  • That's What I Said: Fully-Controllable Talking Face Generation [arXiv 2023] Paper ProjectPage
  • Emotionally Enhanced Talking Face Generation [arXiv 2023] Paper Code ProjectPage
  • A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation [MLSys Workshop 2023] Paper
  • TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles [arXiv 2023] Paper
  • FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions [ICME 2023] Paper
  • DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [arXiv 2023] Paper ProjectPage
  • OPT: ONE-SHOT POSE-CONTROLLABLE TALKING HEAD GENERATION [ICASSP 2023] Paper
  • DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions [ICASSP 2023] Paper Code ProjectPage
  • GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis [ICLR 2023] Paper Code ProjectPage
  • OTAvatar : One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR 2023] Paper Code
  • Emotionally Enhanced Talking Face Generation [arXiv 2023] Paper Code ProjectPage
  • Style Transfer for 2D Talking Head Animation [arXiv 2023] Paper
  • READ Avatars: Realistic Emotion-controllable Audio Driven Avatars [arXiv 2023] Paper
  • On the Audio-visual Synchronization for Lip-to-Speech Synthesis [arXiv 2023] Paper
  • DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis [CVPR 2023] Paper
  • Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation [arXiv 2023] Paper ProjectPage
  • StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles [AAAI 2023] Paper Code
  • Audio-Visual Face Reenactment [WACV 2023] Paper ProjectPage Code

2022

  • Memories are One-to-Many Mapping Alleviators in Talking Face Generation [arXiv 2022] Paper ProjectPage
  • Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers [SIGGRAPH Asia 2022] Paper
  • Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors [arXiv 2022] Paper ProjectPage
  • Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis [CVPR 2022] Paper ProjectPage
  • SPACE: Speech-driven Portrait Animation with Controllable Expression [arXiv 2022] Paper ProjectPage
  • Compressing Video Calls using Synthetic Talking Heads [BMVC 2022] Paper Project Page
  • Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement [arXiv 2022] Paper
  • StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation [arXiv 2022] Paper
  • Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [arXiv 2022] Paper
  • EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022] Paper
  • Talking Head from Speech Audio using a Pre-trained Image Generator [ACM MM 2022] Paper
  • Latent Image Animator: Learning to Animate Images via Latent Space Navigation [ICLR 2022] Paper ProjectPage(note this page has auto-play music...) Code
  • Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition [arXiv 2022] Paper ProjectPage Code
  • Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis [ECCV 2022] Paper ProjectPage Code
  • Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [ECCV 2022] Paper ProjectPage Code
  • Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [ICASSP 2022] Paper ProjectPage Code
  • StableFace: Analyzing and Improving Motion Stability for Talking Face Generation [arXiv 2022] Paper ProjectPage
  • Emotion-Controllable Generalized Talking Face Generation [IJCAI 2022] Paper
  • StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN [arXiv 2022] Paper Code ProjectPage
  • DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering [arXiv 2022] Paper
  • Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions [arXiv 2022] Paper
  • Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels [TMM 2022] Paper
  • Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper ProjectPage Code
  • Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning [CVPR 2022] Paper Code ProjectPage
  • Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper Code ProjectPage
  • Expressive Talking Head Generation with Granular Audio-Visual Control [CVPR 2022] Paper
  • Talking Face Generation with Multilingual TTS [CVPR 2022 Demo] Paper DemoPage
  • SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory [AAAI 2022] Paper

2021

  • Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [SIGGRAPH Asia 2021] Paper Code
  • Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis [ACMMM 2021] Paper Code
  • AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
  • FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning [ICCV 2021] Paper Code
  • Learned Spatial Representations for Few-shot Talking-Head Synthesis [ICCV 2021] Paper
  • Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [CVPR 2021] Paper Code ProjectPage
  • One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing [CVPR 2021] Paper
  • Audio-Driven Emotional Video Portraits [CVPR 2021] Paper Code
  • AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person [arXiv 2021] Paper
  • Talking Head Generation with Audio and Speech Related Facial Action Units [BMVC 2021] Paper
  • Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [IJCAI 2021] Paper
  • Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation [AAAI 2021] Paper
  • Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [arXiv 2021] Paper Code

2020

  • Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [arXiv 2020] Paper Code
  • A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild [ACMMM 2020] Paper Code
  • Talking Face Generation with Expression-Tailored Generative Adversarial Network [ACMMM 2020] Paper
  • Speech Driven Talking Face Generation from a Single Image and an Emotion Condition [arXiv 2020] Paper Code
  • A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [ICPR 2020] Paper
  • Everybody's Talkin': Let Me Talk as You Want [arXiv 2020] Paper
  • HeadGAN: Video-and-Audio-Driven Talking Head Synthesis [arXiv 2020] Paper
  • Talking-head Generation with Rhythmic Head Motion [ECCV 2020] Paper
  • Neural Voice Puppetry: Audio-driven Facial Reenactment [ECCV 2020] Paper Project Code
  • Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis [CVPR 2020] Paper
  • Robust One Shot Audio to Video Generation [CVPRW 2020] Paper
  • MakeItTalk: Speaker-Aware Talking Head Animation [SIGGRAPH Asia 2020] Paper Code
  • FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis. [AAAI 2020] Paper
  • Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose [AAAI 2020] Paper
  • Photorealistic Lip Sync with Adversarial Temporal Convolutional [arXiv 2020] Paper
  • SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES [arXiv 2020] Paper
  • Animating Face using Disentangled Audio Representations [WACV 2020] Paper

Before 2020

  • Realistic Speech-Driven Facial Animation with GANs. [IJCV 2019] Paper PorjectPage
  • Few-Shot Adversarial Learning of Realistic Neural Talking Head Models [ICCV 2019] Paper Code
  • Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss [CVPR 2019] Paper Code
  • Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [AAAI 2019] Paper Code ProjectPage
  • Lip Movements Generation at a Glance [ECCV 2018] Paper
  • X2Face: A network for controlling face generation using images, audio, and pose codes [ECCV 2018] Paper Code ProjectPage
  • Talking Face Generation by Conditional Recurrent Adversarial Network [IJCAI 2019] Paper Code
  • Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks [arXiv 2018] Paper
  • High-Resolution Talking Face Generation via Mutual Information Approximation [arXiv 2018] Paper
  • Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network [arXiv 2018] Paper
  • You said that? [BMVC 2017] Paper

2D Video - Person dependent

  • Continuously Controllable Facial Expression Editing in Talking Face Videos [TAFFC 2023] Paper Project Page
  • Synthesizing Obama: Learning Lip Sync from Audio [SIGGRAPH 2017] Paper Project Page
  • PHOTOREALISTIC ADAPTATION AND INTERPOLATION OF FACIAL EXPRESSIONS USING HMMS AND AAMS FOR AUDIO-VISUAL SPEECH SYNTHESIS [ICIP 2017] Paper
  • HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks [Journal of Computer and Communications2017] Paper
  • ObamaNet: Photo-realistic lip-sync from text [arXiv 2017] Paper
  • A deep bidirectional LSTM approach for video-realistic talking head [Multimedia Tools Appl 2015] Paper
  • Photo-Realistic Expressive Text to Talking Head Synthesis [Interspeech 2013] Paper
  • PHOTO-REAL TALKING HEAD WITH DEEP BIDIRECTIONAL LSTM [ICASSP 2015] Paper
  • Expressive Speech-Driven Facial Animation [TOG 2005] Paper

3D Animation

  • MimicTalk: Mimicking a personalized and expressive 3D talking face in few minutes [NeurIPS 2024] Paper Code ProjectPage
  • ScanTalk: 3D Talking Heads from Unregistered Scans [ECCV 2024] Paper Code
  • Audio-Driven Emotional 3D Talking-Head Generation [arXiv 2024] Paper
  • Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads [arXiv 2024] Paper
  • 3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy [arxiv 2024] Paper
  • ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE [arXiv 2024] Paper Code
  • KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding [ECCV 2024] Paper Code
  • EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention [arXiv 2024] Paper
  • DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation [arXiv 2024] Paper
  • JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model [arXiv 2024] Paper
  • GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer [arXiv 2024] Paper
  • UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model [arXiv 2024] Paper
  • EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head [arXiv 2024] Paper
  • EmoFace: Audio-driven Emotional 3D Face Animation [arXiv 2024] Paper Code
  • MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset [InterSpeed 2024] Paper ProjectPage
  • 3D Gaussian Blendshapes for Head Avatar Animation [SIGGRAPH 2024] Paper
  • CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation [arXiv 2024] Paper
  • GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting [arXiv 2024] Paper
  • Learn2Talk: 3D Talking Face Learns from 2D Talking Face [arXiv 2024] Paper ProjectPage
  • Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication [arXiv 2024] Paper
  • AnimateMe: 4D Facial Expressions via Diffusion Models [arXiv 2024] Paper
  • EmoVOCA: Speech-Driven Emotional 3D Talking Heads [arXiv 2024] Paper
  • FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models [CVPR 2024] Paper Code ProjectPage
  • AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation [arXiv 2024] Paper
  • DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer [arXiv 2024] Paper Code
  • Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance [arXiv 2024] Paper ProjectPage
  • EMOTE: Emotional Speech-Driven Animation with Content-Emotion Disentanglement [SIGGRAPH Asia 2023] Paper ProjectPage
  • PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features [arXiv] Paper
  • 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing [arXiv 2023] Paper Code ProjectPage
  • Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications [arXiv 2023] Paper
  • DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser [arXiv 2023] Paper
  • DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models [arXiv 2023] Paper ProjectPage Code
  • Imitator: Personalized Speech-driven 3D Facial Animation [ICCV 2023] Paper ProjectPage Code
  • Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation [ICCV 2023] Paper
  • Semi-supervised Speech-driven 3D Facial Animation via Cross-modal Encoding [ICCV 2023] Paper
  • Audio-Driven 3D Facial Animation from In-the-Wild Videos [arXiv 2023] Paper ProjectPage
  • EmoTalk: Speech-driven emotional disentanglement for 3D face animation [ICCV 2023] Paper ProjectPage
  • FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning [arXiv 2023] Paper Code ProjectPage
  • Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertices Attention [arXiv 2023] Paper
  • Learning Audio-Driven Viseme Dynamics for 3D Face Animation [arXiv 2023] Paper ProjectPage
  • CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [CVPR 2023] Paper ProjectPage
  • Expressive Speech-driven Facial Animation with controllable emotions [arXiv 2023] Paper
  • Imitator: Personalized Speech-driven 3D Facial Animation [arXiv 2022] Paper ProjectPage
  • PV3D: A 3D Generative Model for Portrait Video Generation [arXiv 2022] Paper ProjectPage
  • Neural Emotion Director: Speech-preserving semantic control of facial expressions in “in-the-wild” videos [CVPR 2022] Paper Code
  • FaceFormer: Speech-Driven 3D Facial Animation with Transformers [CVPR 2022] Paper Code ProjectPage
  • LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization [CVPR 2021] Paper
  • MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [ICCV 2021] Paper
  • AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
  • 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [arXiv 2021] Paper
  • Modality Dropout for Improved Performance-driven Talking Faces [ICMI 2020] Paper
  • Audio- and Gaze-driven Facial Animation of Codec Avatars [arXiv 2020] Paper
  • Capture, Learning, and Synthesis of 3D Speaking Styles [CVPR 2019] Paper
  • VisemeNet: Audio-Driven Animator-Centric Speech Animation [TOG 2018] Paper
  • Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks [TAC 2018] Paper
  • End-to-end Learning for 3D Facial Animation from Speech [ICMI 2018] Paper
  • Visual Speech Emotion Conversion using Deep Learning for 3D Talking Head [MMAC 2018]
  • A Deep Learning Approach for Generalized Speech Animation [SIGGRAPH 2017] Paper
  • Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [TOG 2017] Paper
  • Speech-driven 3D Facial Animation with Implicit Emotional Awareness A Deep Learning Approach [CVPR 2017]
  • Expressive Speech Driven Talking Avatar Synthesis with DBLSTM using Limited Amount of Emotional Bimodal Data [Interspeech 2016] Paper
  • Real-Time Speech-Driven Face Animation With Expressions Using Neural Networks [TONN 2012] Paper
  • Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar [SIST 2010] Paper

Datasets & Benchmark

Survey

  • A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing [arXiv 2024] Paper
  • From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications [arXiv 2023] Paper
  • Deep Learning for Visual Speech Analysis: A Survey [arXiv 2022] Paper
  • What comprises a good talking-head video generation?: A Survey and Benchmark [arXiv 2020] Paper

Colabs