• 3volver@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    2
    ·
    6 months ago

    SD3 seems even worse than SD1.5 in some ways. Better at certain things like text specifically though.

    • j4k3@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      6 months ago

      They usually seem that way at first. I think it is a case of the massive amount of data available initially; like it is unbiased. I think of it like all the shelves in an enormous library lined up on a giant wall. It is very capable and full of information, but lacks any kind of focus.

      Also, when training the big models, as far as I understand it, they train until they find the sweat spot and then go past it. Once they know the curve of how much training is “overtrained,” they go back to a point a little bit before the peak so that fine tuning should place the model at the peak.

      The raw checkpoints usually lack focus in both positive and negative directions. It can be similar to how LLM’s need really good instructions that help Name-2 understand its own role. If you try and define this role even with diffusion, you’re likely to get a change, maybe an improvement. Something like this may help, “You are a helpful generative AI that follows the prompt details exactly.\n\nPrompt:”. That actually helped with Pony when I tried it on the base checkpoint recently.