Toonify3D: StyleGAN-based 3D Stylized Face Generator

Please zoom-in to check our meshing in detail.

Abstract

Recent advances in generative models enable high-quality facial image stylization. Toonify is a popular StyleGAN-based framework that has been widely used for facial image stylization. Our goal is to create expressive 3D faces by turning Toonify into a 3D stylized face generator. Toonify is fine-tuned with a few gradient descent steps from StyleGAN trained for standard faces, and its features would carry semantic and visual information aligned with the features of the original StyleGAN model. Based on this observation, we design a versatile 3D-lifting method for StyleGAN, StyleNormal, that regresses a surface normal map of a StyleGAN-generated face using StyleGAN features. Due to the feature alignment between Toonify and StyleGAN, although StyleNormal is trained for regular faces, it can be applied for various stylized faces without additional fine-tuning. To learn local geometry of faces under various illuminations, we introduce a novel regularization term, the normal consistency loss, based on lighting manipulation in the GAN latent space. Finally, we present Toonify3D, a fully automated framework based on StyleNormal, that can generate full-head 3D stylized avatars and support GAN-based 3D facial expression editing.



Gallery

Results of StyleNormal

Our StyleNormal network trained on few-shot synthetic data provides all the results below. If the arrow for dragging is not visible in the center of the image, press F5 or Ctrl+R to refresh the page.

Image 1: Visualization of facial images and their pixel-aligned surface normals produced by applying our method on cartoon StyleGAN.
Image 2: Visualization of facial images and their pixel-aligned surface normals produced by applying our method on caricature StyleGAN.
Image 3: Visualization of facial images and their pixel-aligned surface normals produced by applying our method on original StyleGAN.

Shading visualization

Video 1: Shading rendering of surface normals from our method.

Shared topology

Video 2: Propagating a texture map of the fourth 3D model. Thanks to a shared topology from a template mesh, facial texture maps can be easily replaced.

Integration with facial performance capture

Video 3-1: Puppeteering our mesh using facial motion capture (Apple ARKit).
Video 3-2: Puppeteering our mesh using facial motion capture (Apple ARKit).

Citation

@inproceedings{Jang2024toonify3d,
            author = {Wonjong Jang and Yucheol Jung and Hyomin Kim and Gwangjin Ju and Chaewon Son and Jooeun Son and Seungyong Lee},
            title = {Toonify3D: StyleGAN-based 3D Stylized Face Generator},
            booktitle = {ACM SIGGRAPH},
            year = {2024}
          }

Dataset

To reproduce our work, you should purchase a dataset from the following link:
https://www.3dscanstore.com/hd-head-scans/hd-head-bundles/10xhd-head-scan-pack-05

Please understand that we cannot freely distribute the training dataset to comply with the dataset license.

Image credit: ©3DScanStore