Quant precision vs quality
Worth mentioning that 'float8_e4m3fn' is WAY different in quality compared to BF16. I mean it was already the case for Flux, but difference there wasn't so dramatic. Is there any way to improve it?
I've seen 'scaled' variants of 8-bit chroma here https://huggingface.co/Clybius/Chroma-fp8-scaled , but it only outputs latent noise.
Any suggestions?
Use Q_8 version. It’s the basic standard rule unless some ultra nerd creates a new quantisation algorithm: fp32> fp16 = Q_8> bf16= Q_6 > fp8_scaled > ( more = than >) fp8
On my machine with RTX4070 (12GB VRAM), with torch.compile and sage attention:
Q6_K -> around 3.1 s/it
Q_8 -> around 2.6 s/it
fp8_scaled (using either 'default' or 'fp8_e4m3fn' dtype) -> around 2.2 s/it
fp8_scaled (using 'fp8_e4m3fn_fast' dtype) -> around 1.3 s/it
so the last example is twice faster compared to Q_8 in my case...
Quality wise, fp8_scaled with 'fp8_e4m3fn_fast' is better than stock Chroma model using the same dtype, as you can see here:
Here is side by side scaled down for convenience (Scaled on the left, Stock on the right):
oh, you got torch compile to work? i'd like to have your workflow too...
@levzzz , I think workflow should be embedded in the first two images with a flower lady above...
Note that I merge two Loras to speed up things a little bit, both at weight 0.4:
- https://civitai.com/models/686704?modelVersionId=768547
- https://huggingface.co/silveroxides/Chroma-LoRA-Experiments/blob/main/Hyper-Chroma-low-step-LoRA.safetensors
Plus ton of CFG trickery, Detail Daemon and so on, which might be irrelevant to you.
With the above loras it converges in about 30-35 steps instead of 45-50 with dpmpp_2m / simple
Important! For torch.compile to work you have to find 'vcvarsall.bat' (comes with MSVC compile tools) to execute it before running comfyui, like so:
call ..\some_path\vcvarsall.bat x64
RunYourComfyUiBatFile.bat
This is because for Chroma specifically it needs more include libs in order to be able to compile for some reason, otherwise it complains on missing header file like 'algorithm.h'
Update: since now chroma code is merged into ComfyUI master (using standard nodes), invocation of 'vcvarsall.bat' is no longer required
Hi, thanks I would love to see that workflow as well. However when I saved the first two images (in PNG format), comfy did not find the workflow. Is it possible for you to post that separately (JSON)?
Here's a Civitai image, workflow should be inside: https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/83eb4411-4d19-4608-ae73-3b6b7d327b14/original=true,quality=90/rgthree.compare._temp_glcgi_00025_.jpeg