Skip to content

Latest commit

 

History

History
276 lines (201 loc) · 14.2 KB

README.md

File metadata and controls

276 lines (201 loc) · 14.2 KB

Demo Code for "Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation"

This repository contains demo programs for the "Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation" project. Roughly, the project is about a machine learning model that can animate an anime character given only one image. However, the model is too slow to run in real-time. So, it also proposes an algorithm to use the model to train a small machine learning model that is specialized to a character image that can anime the character in real time.

This demo code has two parts.

  • Improved model. This part gives a model similar to Version 3 of the porject. It has one demo program:

    • The full_manual_poser allows the user to manipulate a character's facial expression and body rotation through a graphical user interface.

    There are no real-time demos because the new model is too slow for that.

  • Distillation. This part allows the user to train small models (which we will refer to as student models) to mimic that behavior of the full system with regards to a specific character image. It also allows the user to run these models under various interfaces. The demo programs are:

    • distill trains a student model given a configuration file, a $512 \times 512$ RGBA character image, and a mask of facial organs.
    • distiller_ui provides a user-friendly interface to distill, allowing you to create training configurations and providing useful documentation.
    • character_model_manual_poser allows the user to control trained student models with a graphical user interface.
    • character_model_ifacialmocap_puppeteer allows the user to control trained student models with their facial movement, which is captured by the iFacialMocap software. To run this software, you must have an iOS device and, of course, iFacialMocap.
    • character_model_mediapipe_puppeteer allows the user to control trained student models with their facial movement, which is captured a web camera and processed by the Mediapipe FaceLandmarker model. To run this software, you need a web camera.

Preemptive FAQs

What is the program to control character images with my facial movement?

There is no such program in this release. If you want one, try the ifacialmocap_puppeteer of Version 3.

OK. I'm confused. Isn't your work about easy VTubing? Are you saying this release cannot do it?

NO. This release does it in a more complicated way. In order to control an image, you need to create a "student model." It is a small (< 2MB) and fast machine learning model that knows how to animate that particular image. Then, the student model can be controlled with facial movement. You can find two student models in the data/character_models directory. The two demos on the project website feature 13 students models.

So, for this release, you can control only these few characters in real time?

No. You can create your own student models.

How do I create this student model then?

  1. You prepare your characater image according to the "Constraint on Input Images" section below.
  2. You prepare a black-and-white mask image that covers the eyes and the mouth of the character, like this image. You can see how I made it with GIMP by inspecting this GIMP file.
  3. You use distiller_ui to create a configuration file that specifies how the student model should be trained.
  4. You use distiller_ui or distill to start the training process.
  5. You wait several ten hours for the student model to finish training. Last time I tried, it was about 30 hours on a computer with an Nvidia RTX A6000 GPU.
  6. After that, you can control the student model with character_model_ifacialmocap_puppeteer and character_model_mediapipe_puppeteer.

Why is this release so hard to use?

Version 3 is arguably easier to use because you can give it an animate and you can control it with your facial movment immediately. However, I was not satisfied with its image quality and speed.

In this release, I explore a new way of doing things. I added a new preprocessing stage (i.e., training the student models) that has to be done one time per character image. It allows the image to be animated much faster at a higher image quality level.

In other words, it makes the user's life difficult but the engineer/researcher happy. Patient users who are willing to go through the steps, though, would be rewarded with faster animation.

Can I use a student model from a web browser?

No. A student model created by distill is a PyTorch model, which cannot run directly in the browser. It needs to be converted to the appropriate format (TensorFlow.js) first, and the web demos use the converted models. However, The conversion code is not included in this repository. I will not release it unless I change my mind.

Hardware Requirements

All programs require a recent and powerful Nvidia GPU to run. I developed the programs on a machine with an Nvidia RTX A6000. However, anything after the GeForce RTX 2080 should be fine.

The character_model_ifacialmocap_puppeteer program requires an iOS device that is capable of computing blend shape parameters from a video feed. This means that the device must be able to run iOS 11.0 or higher and must have a TrueDepth front-facing camera. (See this page for more info.) In other words, if you have the iPhone X or something better, you should be all set. Personally, I have used an iPhone 12 mini.

The character_model_mediapipe_puppeteer program requires a web camera.

Software Requirements

GPU Driver and CUDA Toolkit

Please update your GPU's device driver and install the CUDA Toolkit that is compatible with your GPU and is newer than the version you will be installing in the next subsection.

Python and Python Libraries

All programs are written in the Python programming languages. The following libraries are required:

  • python 3.10.11
  • torch 1.13.1 with CUDA support
  • torchvision 0.14.1
  • tensorboard 2.15.1
  • opencv-python 4.8.1.78
  • wxpython 4.2.1
  • numpy-quaternion 2022.4.2
  • pillow 9.4.0
  • matplotlib 3.6.3
  • einops 0.6.0
  • mediapipe 0.10.3
  • numpy 1.26.3
  • scipy 1.12.0
  • omegaconf 2.3.0

Instead of installing these libraries yourself, you should follow the recommended method to set up a Python environment in the next section.

iFacialMocap

If you want to use ifacialmocap_puppeteer, you will also need to an iOS software called iFacialMocap (a 980 yen purchase in the App Store). Your iOS and your computer must use the same network. For example, you may connect them to the same wireless router.

Creating Python Environment

Installing Python

Please install Python 3.10.11.

I recommend using pyenv (or pyenv-win for Windows users) to manage multiple Python versions on your system. If you use pyenv, this repository has a .python-version file that indicates it would use Python 3.10.11. So, you will be using Python 3.10.11 automatically once you cd into the repository's directory.

Make sure that you can run Python from the command line.

Installing Poetry

Please install Poetry 1.7 or later. We will use it to automatically install the required libraries. Again, make sure that you can run it from the command line.

Cloning the Repository

Please clone the repository to an arbitrary directory in your machine.

Instruction for Linux/OSX Users

  1. Open a shell.
  2. cd to the directory you just cloned the repository too
    cd SOMEWHERE/talking-head-anime-4-demo
    
  3. Use Python to create a virtual environment under the venv directory.
    python -m venv venv --prompt talking-head-anime-4-demo
    
  4. Activate the newly created virtual environment. You can either use the script I provide:
    source bin/activate-venv.sh
    
    or do it yourself:
    source venv/bin/activate   
    
  5. Use Poetry to install libraries.
    cd poetry
    poetry install
    

Instruction for Windows Users

  1. Open a shell.
  2. cd to the directory you just cloned the repository too
    cd SOMEWHERE\talking-head-anime-4-demo
    
  3. Use Python to create a virtual environment under the venv directory.
    python -m venv venv --prompt talking-head-anime-4-demo
    
  4. Activate the newly created virtual environment. You can either use the script I provide:
    bin\activate-venv.bat
    
    or do it yourself:
    venv\Scripts\activate   
    
  5. Use Poetry to install libraries.
    cd poetry
    poetry install
    

Download the Models/Dataset Files

THA4 Models

Please download this ZIP file hosted on Dropbox, and unzip it to the data/tha4 directory the under the repository's directory. In the end, the directory tree should look like the following diagram:

+ talking-head-anime-4-demo
   + data
      - character_models
      - distill_examples
      + tha4
         - body_morpher.pt
         - eyebrow_decomposer.pt
         - eyebrow_morphing_combiner.pt
         - face_morpher.pt
         - upscaler.pt
     - images
     - third_party

Pose Dataset

If you want to create your own student models, you also need to download a dataset of poses that are needed for the training process. Download this pose_dataset.pt file and save it to the data folder. The directory tree should then look like the following diagram:

+ talking-head-anime-4-demo
   + data
      - character_models
      - distill_examples
      - tha4
      - images
      - third_party
      - pose_dataset.pt

Running the Programs

The programs are located in the src/tha4/app directory. You need to run them from a shell with the provided scripts.

Instruction for Linux/OSX Users

  1. Open a shell.

  2. cd to the repository's directory.

    cd SOMEWHERE/talking-head-anime-4-demo
    
  3. Run a program.

    bin/run src/tha4/app/<program-file-name>
    

    where <program-file-name> can be replaced with:

    • character_model_ifacialmocap_puppeteer.py
    • character_model_manual_poser.py
    • character_model_mediapipe_puppeteer.py
    • distill.py
    • disllerer_ui.py
    • full_manual_poser.py

Instruction for Windows Users

  1. Open a shell.

  2. cd to the repository's directory.

    cd SOMEWHERE\talking-head-anime-4-demo
    
  3. Run a program.

    bin\run.bat src\tha4\app\<program-file-name>
    

    where <program-file-name> can be replaced with:

    • character_model_ifacialmocap_puppeteer.py
    • character_model_manual_poser.py
    • character_model_mediapipe_puppeteer.py
    • distill.py
    • disllerer_ui.py
    • full_manual_poser.py

Contraints on Input Images

In order for the system to work well, the input image must obey the following constraints:

  • It should be of resolution 512 x 512. (If the demo programs receives an input image of any other size, they will resize the image to this resolution and also output at this resolution.)
  • It must have an alpha channel.
  • It must contain only one humanoid character.
  • The character should be standing upright and facing forward.
  • The character's hands should be below and far from the head.
  • The head of the character should roughly be contained in the 128 x 128 box in the middle of the top half of the image.
  • The alpha channels of all pixels that do not belong to the character (i.e., background pixels) must be 0.

An example of an image that conforms to the above criteria

Documentation for the Tools

Disclaimer

The author is an employee of pixiv Inc. This project is a part of his work as a researcher.

However, this project is NOT a pixiv product. The company will NOT provide any support for this project. The author will try to support the project, but there are no Service Level Agreements (SLAs) that he will maintain.

The code is released under the MIT license. The THA4 models and the images under the data/images directory are released under the Creative Commons Attribution-NonCommercial 4.0 International.

This repository redistributes a version of the Face landmark detection model from the MediaPipe project. The model has been released under the Apache License, Version 2.0.