To quote Wikipedia (here): “Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.”. This of course sounds nice, but what makes it special and how can you use it?

Stable Diffusion: Image Generation using AI image 35

What makes Stable Diffusion special?

The dataset

The Stable Diffusion model was trained on three subsets of LAION-5B: laion2B-en, laion-high-resolution, and laion-aesthetics v2 5+. These datasets have been scraped from the web and are available for download (here). This sets Stable Diffusion apart from for example DALL-E and Midjourney where the datasets are not publicly available.

The model

The model is publicly available here and here. Again in contrast to most of the competition. This means you can do things like add additional material for training (such as with Dreambooth) to for example change the context of an object. You can also alter the code yourself and for example disable the NSFW check or disable adding of a hidden watermark by the Stable Diffusion software.

Running locally

Also there is a large community around Stable Diffusion which create tools around the model such as Stable Diffusion WebUI or the previously mentioned Dreambooth. Since the model is publicly available, you can also run it yourself on your laptop for free and you don’t need to depend on the services of a third party to offer this as a SaaS solution.

Features

Text to image generation

You can use a prompt to indicate what you want to have created. You can give weights to specific words in the prompt. The weight can also be negative for things you don’t want to see in your output. The prompt usually contains things like the object or creature you want to see and the style. For example the below image I generated for my daughter of 5 years old;

Stable Diffusion: Image Generation using AI cute — Cute fluffy animals generated using Stable Diffusion

The prompt used for the above image was;

“a beautiful cute fluffy baby animal with a fantasy background, style of kieran yanner, barret frymire, 8k resolution, dark fantasy concept art, by Greg Rutkowski, dynamic lighting, hyperdetailed, intricately detailed, trending on Artstation, deep color, volumetric lighting, Alphonse Mucha, Jordan Grimmer”

Image to text / CLIP interrogation

You can ask the model what it sees in a picture so you can use this text to generate similar images. This is called CLIP interrogation and can be done for example here.

Stable Diffusion: Image Generation using AI image 36 — CLIP interrogation. What is in the picture?

Inpainting

You can replace a part of an image with something else. For example in the below image I’ve replaced the dog on the bench with a cat (I prefer cats).

Stable Diffusion: Image Generation using AI image 37 — Replacing the dog with a cat using inpainting

Outpainting

You can ask the model to generate additional areas around an existing image. For example below is a picture of me. I asked Stable Diffusion to generate a body below my head.

Stable Diffusion: Image Generation using AI image 38 — Generating a body to fit the head

There is even a complete web interface Stable Diffusion Infinity to help you do this on a canvas;

Stable Diffusion: Image Generation using AI image 39 — Stable Diffusion Infinity. Outpainting on a canvas

Upscaling

You can upscale images to add detail. This allows you to create infinite zoom effects.

Stable Diffusion: Image Generation using AI image 40 — Simulating infinite zoom effects by upscaling

This is not an actual infinite zoom but the model adds detail. If for example I upscale a low resolution image of myself, the end result will not be me but something which kinda looks like me.

Stable Diffusion: Image Generation using AI image 41 — Upscaling low resolution images. Me, myself and I but not really

How to use?

Night Cafe Studio

The easiest way to start is by playing around in Night Café Studio. For this you don’t need to setup anything locally and you can get a bit of a feel about what Stable Diffusion is and how it works. When you start to use it more often, they require you to pay but you can get some free credits daily and by participating in the community.

Running locally

If you want to run Stable Diffusion locally, you can use the following WebUI here. How to get it running on if described for Google Colab, local Windows and Mac (untested).

When you’ve started the UI, you can use the various settings to generate images;

Stable Diffusion: Image Generation using AI image 42 — Some settings on the Stable Diffusion WebUI screen

There is also Stable Diffusion Infinity which is specialized in outpainting. You can download it here or try it online here.

You do require a suitable graphics card. An NVidia 4Gb VRAM is about the minimum. With 6Gb to generate and outpaint larger images, I was required to use the following switches in the webui-user.bat: –medvram –opt-split-attention

Challenges and limitations

Ethics

“With great power comes great responsibility” (probably by Voltaire, 1793). When the power to generate images becomes available to a large audience, there are bound to arise issues such as abuse of this technology. Some samples;

You can alter copyrighted material, remove watermarks, upscale thumbnails or low resolution photos. Make variations which are hard to trace back to the original. This allows a person to circumvent certain online protections of images.
You can create fake news. For example create a photo of a large audience at Trump’s inauguration.
You can use the style of artists and their names without permission to create works of art and then compete with these same artists using these generated works. It is also currently not easy to opt-out of AI models as an artist in order to protect your work and style. You can imagine artists are not happy about this.
It becomes easy to generate NSFW material (Google for example Unstable Diffusion). This can be abused by for example using someones Facebook pictures as base material without their permission.

Currently (03-01-2023) there are not many limitations yet fixed in legislation (for as far as I know). In the future the freedom to create or use AI models might be limited or only allowed when it conforms to certain conditions. Currently the AI world is like the start of the internet; a Wild West with few bounds.

Model limitations

Common things are easy, uncommon things not
Less common poses (e.g. hands) and less common or highly detailed objects (e.g. crossbow)
Resolution
512 x 512 is default and the resolution the model (SD 1.5) works best at, can do multiples of 64. E.g. 578, 640, 704. Stable Diffusion 2.1 works at 768 x 768 resolution
Requires good graphics card.
E.g. Nvidia 4Gb absolute minimum, 8Gb preferable (or cloud, Google Colab)
Generation takes time and requires patience
It can take hours to generate (multiple variants of) images when running locally

Learning curve

Setting up your environment requires some knowledge.
Tweaking your generation configuration is not straightforward and requires you to understand a bit of what is actually happening.
Generating prompts which create nice images is not as straightforward as you might expect. For example you need to know which artists create the style you want to generate images of. Also there are words which help such as ‘high resolution’ and negative prompts such as ‘draft’. Knowing which words to use plays a major part in generating good images.
Establishing a workflow is important. First generation, next inpainting, next upscaling is a general way to go about this. Especially the inpainting phase takes a lot of time.

Adding all of this together makes for quite a steep learning curve before you can start creating works which are actually aesthetically pleasing.

Stable Diffusion: Image Generation using AI

What makes Stable Diffusion special?

The dataset

The model

Running locally

Features

Text to image generation

Image to text / CLIP interrogation

Inpainting

Outpainting

Upscaling

How to use?

Night Cafe Studio

Running locally

Challenges and limitations

Ethics

Model limitations

Learning curve

Like this:

About The Author

Maarten Smeets

Leave a ReplyCancel reply

What makes Stable Diffusion special?

The dataset

The model

Running locally

Features

Text to image generation

Image to text / CLIP interrogation

Inpainting

Outpainting

Upscaling

How to use?

Night Cafe Studio

Running locally

Challenges and limitations

Ethics

Model limitations

Learning curve

Share this:

Like this:

Related Posts

About The Author

Maarten Smeets

Leave a ReplyCancel reply