Negative/Positive prompts- Settings

#44
by Privac - opened

Are there some specific "quality tags" that have been used or are being used to guide the model during training? Both positive and negative. In the workflow posted "low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors" are used because these are for general usage or just specific to that image? In the comfy official workflow the positive tag "aesthetic 2" is used and it definitively seems to improve the overall composition, are there others?
Also the 26steps are the "right"ones or a compromise for faster inference? Is there an official Scheduler/Solver to be used?

Privac changed discussion title from Negative/Positive prompts to Negative/Positive prompts- Settings

To be taken with a pinch of salt, but as far as I know / have investigated:

  • I only use illustration, anime, drawing, artwork, bad hands as neg. And all but last just because I prefer photos. And not even sure if the last one is useful, hands have improved in v33
  • aesthetic 11 is something I heard too, but not for Chroma specifically. But as a general rule, you'll get more illustrations with tags than natural language, at least for now. Depends on what you want: if you don't want illustrations, I would suggest you bypass this, I never ever use such things.
  • sampler / scheduler / steps: I use euler / beta / 40 steps as my go-to config, but 40 steps is probably overkill. IIRC, the number of needed steps have improved around v29-30, can't remember. So 26 or 30 steps should do it, I barely see the difference with 40 tbh.
  • final note: Chroma is getting impressive on prompt adherence (and trust me, I'm sooo bad at prompting). My humble piece of advice: neg prompt is not your friend, focus on what you want, not what you don't want. Chroma seems to understand that better.

To be taken with a pinch of salt, but as far as I know / have investigated:

  • I only use illustration, anime, drawing, artwork, bad hands as neg. And all but last just because I prefer photos. And not even sure if the last one is useful, hands have improved in v33
  • aesthetic 11 is something I heard too, but not for Chroma specifically. But as a general rule, you'll get more illustrations with tags than natural language, at least for now. Depends on what you want: if you don't want illustrations, I would suggest you bypass this, I never ever use such things.
  • sampler / scheduler / steps: I use euler / beta / 40 steps as my go-to config, but 40 steps is probably overkill. IIRC, the number of needed steps have improved around v29-30, can't remember. So 26 or 30 steps should do it, I barely see the difference with 40 tbh.
  • final note: Chroma is getting impressive on prompt adherence (and trust me, I'm sooo bad at prompting). My humble piece of advice: neg prompt is not your friend, focus on what you want, not what you don't want. Chroma seems to understand that better.

what 's the average number of words you use for a positive prompt and how many words for the longer ones without joepardizing the output you desire ?

what 's the average number of words you use for a positive prompt and how many words for the longer ones without joepardizing the output you desire ?

Around 150 words I would say, sometimes more, so probably much more in terms of tokens... yeah I know I'm a psychopath :)
But honestly, Chroma responds very well to this.

What I find important though, is to organize my prompt correctly: for this I look at the preview in the KSampler, which gives an indication about what is computed and when. Some details are only processed at later stage.
For example, your character pose should certainly be at the top if you do full body shot. But whether he/she has blue eyes, earrings, look at me joyfully, ... can come later, unless you do a portrait ofc.

Generally speaking, I group similar stuff in the same sentence, so that the model process them at the same moment; for example, I describe the position of legs and arms together, or the color of eyes and the look together.
And also, I write each sentence in a new line, this makes edition easier and has 0 impact on the prompt adherence afaik. And this allows me to comment out some lines when testing (I use a node of my own extension to achieve that).

Is there a recommended aspect ratio that should work better than the others? The training is in 512px but which aspect ratio is better trained? 1:1,3:4,16:9 etc.?

Is there a recommended aspect ratio that should work better than the others? The training is in 512px but which aspect ratio is better trained? 1:1,3:4,16:9 etc.?

Don't know about the training itself (and would love to know), but I have used many different aspect ratios with very good results.
But, as you wrote, training is done at low res atm, and pushing the res too high will often degrade results. With worse lighting / atmosphere in photos for example. 1216x1216 is often not so good (yet).
Res I use the most: 1024x1024, 1152x1152, 896x1152, 832x1216 (and vice versa ofc).

  • Sometimes 832x1488, but I don't see the point most of the time.
  • And 1024x1152 or 1024x1216 have worked pretty well, too

Sign up or log in to comment