Stable Diffusion: Image Generation using AI

Maarten Smeets
0 0
Read Time:6 Minute, 25 Second

To quote Wikipedia (here): “Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.”. This of course sounds nice, but what makes it special and how can you use it?

What makes Stable Diffusion special?

The dataset

The Stable Diffusion model was trained on three subsets of LAION-5B: laion2B-en, laion-high-resolution, and laion-aesthetics v2 5+. These datasets have been scraped from the web and are available for download (here). This sets Stable Diffusion apart from for example DALL-E and Midjourney where the datasets are not publicly available.

The model

The model is publicly available here and here. Again in contrast to most of the competition. This means you can do things like add additional material for training (such as with Dreambooth) to for example change the context of an object. You can also alter the code yourself and for example disable the NSFW check or disable adding of a hidden watermark by the Stable Diffusion software.

Running locally

Also there is a large community around Stable Diffusion which create tools around the model such as Stable Diffusion WebUI or the previously mentioned Dreambooth. Since the model is publicly available, you can also run it yourself on your laptop for free and you don’t need to depend on the services of a third party to offer this as a SaaS solution.

Features

Text to image generation

You can use a prompt to indicate what you want to have created. You can give weights to specific words in the prompt. The weight can also be negative for things you don’t want to see in your output. The prompt usually contains things like the object or creature you want to see and the style. For example the below image I generated for my daughter of 5 years old;

Cute fluffy animals generated using Stable Diffusion

The prompt used for the above image was;

“a beautiful cute fluffy baby animal with a fantasy background, style of kieran yanner, barret frymire, 8k resolution, dark fantasy concept art, by Greg Rutkowski, dynamic lighting, hyperdetailed, intricately detailed, trending on Artstation, deep color, volumetric lighting, Alphonse Mucha, Jordan Grimmer”

Image to text / CLIP interrogation

You can ask the model what it sees in a picture so you can use this text to generate similar images. This is called CLIP interrogation and can be done for example here.

CLIP interrogation. What is in the picture?

Inpainting

You can replace a part of an image with something else. For example in the below image I’ve replaced the dog on the bench with a cat (I prefer cats).

Replacing the dog with a cat using inpainting

Outpainting

You can ask the model to generate additional areas around an existing image. For example below is a picture of me. I asked Stable Diffusion to generate a body below my head.

Generating a body to fit the head

There is even a complete web interface Stable Diffusion Infinity to help you do this on a canvas;

Stable Diffusion Infinity. Outpainting on a canvas

Upscaling

You can upscale images to add detail. This allows you to create infinite zoom effects.

Simulating infinite zoom effects by upscaling

This is not an actual infinite zoom but the model adds detail. If for example I upscale a low resolution image of myself, the end result will not be me but something which kinda looks like me.

Upscaling low resolution images. Me, myself and I but not really

How to use?

Night Cafe Studio

The easiest way to start is by playing around in Night Café Studio. For this you don’t need to setup anything locally and you can get a bit of a feel about what Stable Diffusion is and how it works. When you start to use it more often, they require you to pay but you can get some free credits daily and by participating in the community.

Running locally

If you want to run Stable Diffusion locally, you can use the following WebUI here. How to get it running on if described for Google Colab, local Windows and Mac (untested).

When you’ve started the UI, you can use the various settings to generate images;

Some settings on the Stable Diffusion WebUI screen

There is also Stable Diffusion Infinity which is specialized in outpainting. You can download it here or try it online here.

You do require a suitable graphics card. An NVidia 4Gb VRAM is about the minimum. With 6Gb to generate and outpaint larger images, I was required to use the following switches in the webui-user.bat: –medvram –opt-split-attention 

Challenges and limitations

Ethics

“With great power comes great responsibility” (probably by Voltaire, 1793). When the power to generate images becomes available to a large audience, there are bound to arise issues such as abuse of this technology. Some samples;

  • You can alter copyrighted material, remove watermarks, upscale thumbnails or low resolution photos. Make variations which are hard to trace back to the original. This allows a person to circumvent certain online protections of images.
  • You can create fake news. For example create a photo of a large audience at Trump’s inauguration.
  • You can use the style of artists and their names without permission to create works of art and then compete with these same artists using these generated works. It is also currently not easy to opt-out of AI models as an artist in order to protect your work and style. You can imagine artists are not happy about this.
  • It becomes easy to generate NSFW material (Google for example Unstable Diffusion). This can be abused by for example using someones Facebook pictures as base material without their permission.

Currently (03-01-2023) there are not many limitations yet fixed in legislation (for as far as I know). In the future the freedom to create or use AI models might be limited or only allowed when it conforms to certain conditions. Currently the AI world is like the start of the internet; a Wild West with few bounds.

Model limitations

  • Common things are easy, uncommon things not
    Less common poses (e.g. hands) and less common or highly detailed objects (e.g. crossbow)
  • Resolution
    512 x 512 is default and the resolution the model (SD 1.5) works best at, can do multiples of 64. E.g. 578, 640, 704. Stable Diffusion 2.1 works at 768 x 768 resolution
  • Requires good graphics card. 
    E.g. Nvidia 4Gb absolute minimum, 8Gb preferable (or cloud, Google Colab) 
  • Generation takes time and requires patience
    It can take hours to generate (multiple variants of) images when running locally

Learning curve

  • Setting up your environment requires some knowledge. 
  • Tweaking your generation configuration is not straightforward and requires you to understand a bit of what is actually happening. 
  • Generating prompts which create nice images is not as straightforward as you might expect. For example you need to know which artists create the style you want to generate images of. Also there are words which help such as ‘high resolution’ and negative prompts such as ‘draft’. Knowing which words to use plays a major part in generating good images. 
  • Establishing a workflow is important. First generation, next inpainting, next upscaling is a general way to go about this. Especially the inpainting phase takes a lot of time.

Adding all of this together makes for quite a steep learning curve before you can start creating works which are actually aesthetically pleasing.

About Post Author

Maarten Smeets

Maarten is a Software Architect at AMIS Conclusion. Over the past years he has worked for numerous customers in the Netherlands in developer, analyst and architect roles on topics like software delivery, performance, security and other integration related challenges. Maarten is passionate about his job and likes to share his knowledge through publications, frequent blogging and presentations.
Happy
Happy
0 %
Sad
Sad
0 %
Excited
Excited
0 %
Sleepy
Sleepy
0 %
Angry
Angry
0 %
Surprise
Surprise
0 %

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Next Post

OCI Function Parameters based on Vault Secrets–an enhancement request

OCI Functions are powerful components in a cloud native world. Triggered by events and direct requests, these functions are engaged to perform a specific task. Which could be almost anything (as long as it does not take too long). And it can certainly be some task that requires access to […]
%d bloggers like this: