Quant precision vs quality

#16

by GTManiK - opened Apr 30

Apr 30

Worth mentioning that 'float8_e4m3fn' is WAY different in quality compared to BF16. I mean it was already the case for Flux, but difference there wasn't so dramatic. Is there any way to improve it?
I've seen 'scaled' variants of 8-bit chroma here https://huggingface.co/Clybius/Chroma-fp8-scaled , but it only outputs latent noise.

Any suggestions?

Privac

May 1

Use Q_8 version. It’s the basic standard rule unless some ultra nerd creates a new quantisation algorithm: fp32> fp16 = Q_8> bf16= Q_6 > fp8_scaled > ( more = than >) fp8

GTManiK

May 1

•

edited May 1

On my machine with RTX4070 (12GB VRAM), with torch.compile and sage attention:

Q6_K -> around 3.1 s/it
Q_8 -> around 2.6 s/it
fp8_scaled (using either 'default' or 'fp8_e4m3fn' dtype) -> around 2.2 s/it
fp8_scaled (using 'fp8_e4m3fn_fast' dtype) -> around 1.3 s/it

so the last example is twice faster compared to Q_8 in my case...

Quality wise, fp8_scaled with 'fp8_e4m3fn_fast' is better than stock Chroma model using the same dtype, as you can see here:

Scaled (v26):

Stock (v26):

Here is side by side scaled down for convenience (Scaled on the left, Stock on the right):

levzzz

May 1

oh, you got torch compile to work? i'd like to have your workflow too...

GTManiK

May 1

•

edited May 1

@levzzz , I think workflow should be embedded in the first two images with a flower lady above...

Note that I merge two Loras to speed up things a little bit, both at weight 0.4:

Plus ton of CFG trickery, Detail Daemon and so on, which might be irrelevant to you.
With the above loras it converges in about 30-35 steps instead of 45-50 with dpmpp_2m / simple

~~Important! For torch.compile to work you have to find 'vcvarsall.bat' (comes with MSVC compile tools) to execute it before running comfyui, like so:~~

call ..\some_path\vcvarsall.bat x64
RunYourComfyUiBatFile.bat

~~This is because for Chroma specifically it needs more include libs in order to be able to compile for some reason, otherwise it complains on missing header file like 'algorithm.h'~~
Update: since now chroma code is merged into ComfyUI master (using standard nodes), invocation of 'vcvarsall.bat' is no longer required

EnragedAntelope

May 2

Hi, thanks I would love to see that workflow as well. However when I saved the first two images (in PNG format), comfy did not find the workflow. Is it possible for you to post that separately (JSON)?

GTManiK

29 days ago

@levzzz , @EnragedAntelope

Here's a Civitai image, workflow should be inside: https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/83eb4411-4d19-4608-ae73-3b6b7d327b14/original=true,quality=90/rgthree.compare._temp_glcgi_00025_.jpeg

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment