🔍

basic prompt guide and image design hints for working with Stable Diffusion Models

PROMPT BASICS

Prompting has yet no universal ruleset, as it heavily depends on what data and how the model you are using is trained on. Although, if we work with recent Stable Diffusion Models like 1.5 or XL, we can point out some fundamental rules that yet apply to many other image generators too.


basic image prompt elements

It is obvious, we need to guide the generator the direction we want it to head in if we use Text only. Therefore the definition of the medium, the subject and stilistic aspects are only some key points we need to give the machine to have proper result. Precision in words more valuable than quantity here.

examle of elements within a image prompt
one possible output of the example prompt

representational words

It is crucial to use representative words and vocabulary in the prompt, as models are unlikely to be able to handle phrases such as “better late than never”. It is best to stick to describing in simple terms what you want to see in the picture, for example: “an illustration of a clock with a happy face”.

using of representational words

comma separation

Using multiple comma in a prompt can create more prescision in your prompt althought, the results are often ambivalent. As a rule of thumb, it makes sense to comma separate certain ideas or descriptions, that suit together or need to be described as a hole.


sequence of words

The sequence of the used words in a prompt is weighted by the order. Words, that come first are more important for the result.

effects of sequence of words in a prompt

manual weighting

As the length of your prompt increases, the importance of each word for the rendering result decreases. With the user-defined weighting, you can manually emphasize or minimize a certain aspect within your prompt. This is possible by placing the relevant group of words in brackets with a simple numerical indicator, e.g. (a huge house:1.3) to increase or (a big house:0.8) to decrease.

manual weighting within prompt

negative prompting

A negative prompt allows the user to specify what the renderer should avoid without additional input. It is a parameter that tells the model what should not be included in the generated image. The negative prompt has the same form as the positive prompt.

simple example for a negative prompt

IMAGE BASICS

Besides the prompt, there is some other prominent features, that influence the the render results.


The seed

The so called „seed“ is a core element of each rendering. Simplified it can be understood as a form of visual noise, that forms the starting point of each diffusion process. In reverse, this means, that each rendering result can be reconstructed exactly when using the same seed.

each seed produces different results


size and ratio

The set aspect ratio and the size of the desired rendering have a major influence on the result and the behavior of the render engine. In particular, other formats outside the square produce many interesting compositions. This also depends on the motif you want to visualize – experiment with it.

same seed, same setup, different size and ratio


sampler

The sampler can be understood as a method, how the information coming from the neuronal network is applied to canvas (latent space). Each sampler has slightly different characteristics and works differently with each model.

same seed and setup, different sampler


steps

The steps are a count, the show how „often“ the image is overworked by the network during the diffusion process. Typically each model has its proper range of steps for specific result. Therefor it is necessary to see the documentation or even try out.

variantion in steps for the diffusion process

cfg scale

The cfg scale ( classifier-free guidance scale ) is a factor, that points out, how strict a diffusion process is sticked to a given prompt. Higher values often lead to „overcooked“ results, as shown in the example.

variants in application of cfg scale

debugging matrices

It is quiet dependend on the subject and style you like to visualize with generative engines. Therefore creating a matrix of basic setup parameters using your current model is quiet helpful to find the actual sweet spot.

steps / cfg scale


prompt: analog zoomed surreal portrait photo of androgyn adult casual person with realistic wrinkled skin in white oversized inflated used linen hoodie from side with white hood with detailed black traditional embroidery and white lace in front of old european building trees and grass wilderness, (holding black minimalistic sculpture object with hands:1.1) , white smoke clouds with diffuse lighting, depth of field, vignette

model: Realistic Vision 5.1 / Stable Diffusion 1.5
sampler: DPM++ 2M
seed: fixed

test render matrix – steps / cfg scale

sampler / scheduler type


analog zoomed surreal portrait photo of androgyn adult (female:0.2) casual person with realistic skin in white oversized inflated used linen hoodie from side with white hood with detailed black traditional embroidery and white lace in front of old european building trees and grass wilderness, (holding black minimalistic sculpture object with hands:1.1) , white dense smoke clouds, diffuse lighting, depth of field, vignette

cfg scale: 8.0
steps: 24
seed: fixed