PROMPT BASICS
Prompting has yet no universal ruleset, as it heavily depends on what data and how the model you are using is trained on. Although, if we work with recent Stable Diffusion Models like 1.5 or XL, we can point out some fundamental rules that yet apply to many other image generators too.
basic image prompt elements
It is obvious, we need to guide the generator the direction we want it to head in if we use Text only. Therefore the definition of the medium, the subject and stilistic aspects are only some key points we need to give the machine to have proper result. Precision in words more valuable than quantity here.
representational words
It is crucial to use representative words and vocabulary in the prompt, as models are unlikely to be able to handle phrases such as “better late than never”. It is best to stick to describing in simple terms what you want to see in the picture, for example: “an illustration of a clock with a happy face”.
comma separation
Using multiple comma in a prompt can create more prescision in your prompt althought, the results are often ambivalent. As a rule of thumb, it makes sense to comma separate certain ideas or descriptions, that suit together or need to be described as a hole.
sequence of words
The sequence of the used words in a prompt is weighted by the order. Words, that come first are more important for the result.
manual weighting
As the length of your prompt increases, the importance of each word for the rendering result decreases. With the user-defined weighting, you can manually emphasize or minimize a certain aspect within your prompt. This is possible by placing the relevant group of words in brackets with a simple numerical indicator, e.g. (a huge house:1.3) to increase or (a big house:0.8) to decrease.
negative prompting
A negative prompt allows the user to specify what the renderer should avoid without additional input. It is a parameter that tells the model what should not be included in the generated image. The negative prompt has the same form as the positive prompt.
IMAGE BASICS
Besides the prompt, there is some other prominent features, that influence the the render results.
The seed
The so called „seed“ is a core element of each rendering. Simplified it can be understood as a form of visual noise, that forms the starting point of each diffusion process. In reverse, this means, that each rendering result can be reconstructed exactly when using the same seed.
size and ratio
The set aspect ratio and the size of the desired rendering have a major influence on the result and the behavior of the render engine. In particular, other formats outside the square produce many interesting compositions. This also depends on the motif you want to visualize – experiment with it.
sampler
The sampler can be understood as a method, how the information coming from the neuronal network is applied to canvas (latent space). Each sampler has slightly different characteristics and works differently with each model.
steps
The steps are a count, the show how „often“ the image is overworked by the network during the diffusion process. Typically each model has its proper range of steps for specific result. Therefor it is necessary to see the documentation or even try out.
cfg scale
The cfg scale ( classifier-free guidance scale ) is a factor, that points out, how strict a diffusion process is sticked to a given prompt. Higher values often lead to „overcooked“ results, as shown in the example.
debugging matrices
It is quiet dependend on the subject and style you like to visualize with generative engines. Therefore creating a matrix of basic setup parameters using your current model is quiet helpful to find the actual sweet spot.
steps / cfg scale
prompt: analog zoomed surreal portrait photo of androgyn adult casual person with realistic wrinkled skin in white oversized inflated used linen hoodie from side with white hood with detailed black traditional embroidery and white lace in front of old european building trees and grass wilderness, (holding black minimalistic sculpture object with hands:1.1) , white smoke clouds with diffuse lighting, depth of field, vignette
model: Realistic Vision 5.1 / Stable Diffusion 1.5
sampler: DPM++ 2M
seed: fixed
sampler / scheduler type
analog zoomed surreal portrait photo of androgyn adult (female:0.2) casual person with realistic skin in white oversized inflated used linen hoodie from side with white hood with detailed black traditional embroidery and white lace in front of old european building trees and grass wilderness, (holding black minimalistic sculpture object with hands:1.1) , white dense smoke clouds, diffuse lighting, depth of field, vignette
cfg scale: 8.0
steps: 24
seed: fixed