Last 2 weeks in multimodal ai art (05/Jul - 19/Jul)

First big text-to-video model out (CogVideo), more out on text-to-3D, 'image editing with text' getting better - and a bunch of community trained diffusion models

Jul 20, 2022

Two weeks accumulated for the last bit of my vacations! But it was worth the wait - a lot of amazing news!

Text-to-video updates

* code released

Text-to-video updates:

- CogVideo released (GitHub, Guided Demo, Run it yourself - paying $1.10/h for the compute)

by THUDM

The most powerful open source text-to-video model is now publicly avaliable. However running it is still a challenge as it requires only commercial grade GPU machines to run.

apolinário from multimodal ai art @multimodalart

It happened! Guided Demo and models released for CogVideo Guided Demo: wudao.aminer.cn/cogvideo/ Models (recommended to run on A100s only, so bit inacessible for now - I'm exploring): github.com/THUDM/CogVideo Prompt: 燃烧的心 - Burning heart

Text-to-3D (and 3D-to-text) updates

- CLIP-Actor released - generate moving 3D avatars (GitHub)

by AMI Lab @ POSTECH

apolinário from multimodal ai art @multimodalart

CLIP-Actor code released! github.com/postech-ami/CL… - the codebase does both: turns a human-pose mesh into your text-description - and also animates the movement based on text! One more advancement to the very exciting text-to-3D scene

- Pulsar+CLIP released - generate 3D point clouds (Colab)

by nev (@apeoffire)

apolinário from multimodal ai art @multimodalart

nev @apeoffire

Introducing Pulsar+CLIP: the smaller cousin of text2voxels Instead of voxels, this notebook generates images from text with point clouds Prompt: a blue cat https://t.co/zrrxRkMRp2

- PointCLIP released - classify 3D point-clouds (GitHub)

by Renrui Zhang

apolinário from multimodal ai art @multimodalart

PointCLIP: Point Cloud Understanding by CLIP Extending CLIP capabilities for CLIP to understand 3D point clouds into its classifications - one cool step for zero shot 3D environment recognition The code for the paper was released too! github.com/ZrrSkywalker/P…

Text-to-image updates:

- CF-CLIP released - edit images with words (GitHub)

by Yingchen Yu

apolinário from multimodal ai art @multimodalart

CF-CLIP code released - essentially edits an image from StyleGAN using the text you input to it. Currently it has pre-trained models for editing Faces, Dogs and Cats: github.com/yingchen001/CF…

AK @_akhaliq

Towards Counterfactual Image Manipulation via CLIP abs: https://t.co/TMP00dfO2j https://t.co/KYWXtrmbQN

- CLIP2StyleGAN released - transform images with words (GitHub)

by @AbdalRameen

apolinário from multimodal ai art @multimodalart

CLIP2StyleGAN code released: github.com/RameenAbdal/CL… The paper is from Dec/2021 but the code is out now! Currently it has pre-trained models for manipulating faces and cars - similar to CF-CLIP but with more 'creative liberty' on the edits

AK @_akhaliq

CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions abs: https://t.co/itJUcJ9DDV https://t.co/sHmsE0jYtP

- 6 community trained Diffusion Models (Watercolor, Lithography, Medieval, Handpainted CG, Ukiyo-e Portraits, Liminal Spaces)

Watercolor, Lithography and Medieval by @KaliYuga_ai, Handpainted CG by @FeiArt_AiArt, Ukiyo-e Portraits by @avantcontra, Liminal Spaces by @JohnWowCool

apolinário from multimodal ai art @multimodalart

Three amazing new fine-tuned diffusion models by @KaliYuga_ai: Watercolor: github.com/KaliYuga-ai/Wa… Lithography: github.com/KaliYuga-ai/Li… Medieval: github.com/KaliYuga-ai/Me…