Hi there,

I’m a complete amateur in design and painting, but I got kinda hooked on Stable Diffusion, because it (theoretically) lets me create images out of my fantasy without needing the digital painting skills.

I poured a couple of weekend free time into learning how to use SD and by now I’m somewhat familiar with how to make useful prompts, how to use Control Net, Inpainting and Upscaling.

But now I’m a bit at a loss on how to further perfect my workflow, because as of right now I can get really good images that kinda resemble the scene I was going for (letting the model / loras do the heavy lifting) or I’m getting an image that is composed exactly as I want (utilizing control net heavily) but is very poorly executed in the details with all sorts of distorted faces, ugly hands and so on.

Basically, if I give a more vague prompt the image comes out great but the more specific I want to be, the more the image generation feels “strangled” by prompt and control net and it doesn’t seem to result in usable images …

How do you approach this? Trying to generate 100’s or more images in the hope that one of them will get your envisioned scene correctly? Or do you make heavy use of Photoshop/Gimp for postprocessing (<- I want to avoid this) or do you painstakingly inpaint all the small details until it fits?

Edit: Just to add a thought here: I just started to realise how limited most of the models are in what they “recognise”. All our everyday items are covered pretty well, e.g. prompting “smartphone” or “coffeemachine” will produce very good results, but things like “screwdriver” are getting dicey already and with special terms like “halberd” it is completely hopeless. Seems I will need to go through with making my own lora as discussed in the other thread …

  • FactorSD@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    2
    ·
    1 年前

    I too hate having to use GIMP to fiddle with images. Really the best way to approach it will depend on how you think/approach things more than anything else. With enough effort you can convince SD to do almost anything, it just depends which phase of the process you are working at.

    My workflow is something like this:

    • Generate figures using a general prompt and a controlnet to set the composition in the frame. Reroll the seed until I get something nice, then freeze the seed.
    • If the composition is not quite right, spit the resulting image out into the OpenPose editor, detect the pose, then move the figure to fix whatever the problem is (repeat as necessary, back and forth between text2img and OpenPose)
    • Turn on ADetailer, and at least set it to give me an appropriate face (no big foreheads)
    • Start refining the prompt, adding more words and more details. Use the prompt history extension so you never lose a good one. At this stage every generation should be very similar to the previous one, and that’s the point, so you can do constant better or worse comparisons. You can use X/Y plots to try out different values and so on here.
    • Add LORA, but expect to be disappointed. Often they will fry the image unless strongly controlled.
    • If the LORA doesn’t need to be in the whole scene I use latent couples and composable LORA to keep it to the specific area that it needs to be in.
    • Alternatively I use InpaintAnything to segment the image, then inpaint that way rather than generating again.

    Sometimes you get better results regenerating a whole new image, sometimes you get better results from inpainting. The gods of SD are fickle.

    For things like hands and feet, I generally preferred to try and fix them in the text2img phase, because they are the very devil to get right after the fact. At a minimum I want 4 fingers and a thumb on each side.