Creating consistent animation frame sequences of character or body motion seems to be currently best achieved through concise raw video material. This means clear shots of the body and ( improvised 🙂 ) props to use img2img piplines with as low as possible denoising strength. Here is some examples:
PROMPT:ghibli style, robot dancer in a fluffy melted workshop, fine details, fine lines, sharp, contrast, outlines – CFG:11 / STEPS:25
All images are generated with the explicitly trained with the Ghibli-Diffusion Model – pretrained with frames from this fantastic animation studio.