This (longer) week in multimodal ai art (25/Jun - 05/Jul)

This week we got VQ-Diffusion colab, 1-line DIsco Diffusion, face edition with CLIP, a CLIP in Turkish and an amazing synthetic Aesthetic Captions dataset

Jul 05, 2022

Hi all! Again the newsletter on a bit of a weird timeline - this time is because I’m on vacation. However, the updates this week are really exciting!

* code released

Text-to-Image synthesizers:

- VQ-Diffusion Colab released (Colab)

by Cene655

VQ-Diffusion is a new approach for text-to-image generation released by Microsoft, where a diffusion model is used on a VQ-VAE latent space. We have reported ithere a few weeks agoand now a Colab notebook is out for it

multimodal ai art @multimodalart

Kind of stealthily Microsoft released "Improved VQ-Diffusion" - a follow-up on their technique that combines a VQ-VAE with diffusion They released the code, the weights and a new VQ-VAE trained github.com/microsoft/VQ-D… I'm running the first experiments:

- Discoart released (GitHub)

by Jina AI

A Python library to run the famous Disco Diffusion text-to-image model with one line of code, while still supporting most of the colab notebook features.

Han Xiao @hxiao

Excited to announce 🪩 𝗗𝗶𝘀𝗰𝗼𝗔𝗿𝘁: create compelling Disco Diffusion artworks in just one line! Radically easy, fully optimized for Google Colab free tier. github.com/jina-ai/discoa… @multimodalart #creativeai #generativeart #opensource

github.comGitHub - jina-ai/discoart: Create Disco Diffusion artworks in one lineCreate Disco Diffusion artworks in one line. Contribute to jina-ai/discoart development by creating an account on GitHub.

- Deep Image Diffusion Prior released (GitHub)

by @nousr_ and @laion_ai

Deep Image Diffusion Prior is a technique that combines the DALL-E 2 CLIp text to image embedding together with Deep Image Prior technique by Katherine Crowson and Daniel Russell to visualize the features in CLIP's weights corresponding to activations from your prompt.

multimodal ai art @multimodalart

Deep Image Diffusion Prior released: visualize CLIP! @nousr_ together with @laion_ai combined the DALL-E 2-like text to image embedding for CLIP with deep image prior! Can be seen as a 'what is CLIP seeing' kind of model github.com/LAION-AI/deep-…

- MGAD - image prompts to Diffusion (GitHub)

by Nisha Huang

MGAD enables more modalities (other than text) to be used as inputs for text-to-image diffusion models - so image prompts can be used to help guide and give style to images.

multimodal ai art @multimodalart

Nisha Huang released code for not-yet-released paper "Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion" github.com/haha-lisa/MGAD… Model supports more modalities - such as image prompts, together with text - as input for diffusion - to be explored

- TeCM-CLIP - manipulate faces with CLIP (GitHub)

by Nisha Huang

TeCM CLIP is a face editing/manipulation tool that enables the use of natural language to provide text-editing to pre-existing faces.

multimodal ai art @multimodalart

Lou Xudong released TeCM-CLIP - Text-based Controllable Multi-attribute Face Image Manipulation, a model specialised in face editing with text commands using CLIP: github.com/lxd941213/TeCM…

New CLIP and CLIP-like models:

TrCLIP - Turkish CLIP released (GitHub)

by Yusuf Anı

Yusuf Anı on GitHub released TrCLIP - a CLIP model in the Turkish language. More information will be out after the INISTA 2022 conference.

Datasets:

Simulacra Aesthetic Captions (GitHub)

by John David Pressman

JD Pressman released Simulacra Aesthetic Captions - a dataset of images generated from text together with their prompts and the aesthetic score given by users for it. This dataset enables for prompt analysis, training aesthetic predictors (such as LAION's), and many more use-cases as listed on the GitHub page.