How to use LoRA with this model in Python?

#42
by megatrump - opened

Hi, thank you very much for your contribution! I have successfully used this quantized model in ComfyUI and successfully loaded the LoRA. Similarly, I can load this model using stable-diffusion.cpp and use LoRA. However, I encountered some issues. I would like to use Python to manually load the model and manage the model lifecycle. Specifically, I used the code mentioned in #41 and made modifications:

import torch
from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig

checkpoint = "models/flux1-dev-q4_0.gguf"
transformer = FluxTransformer2DModel.from_single_file(
    checkpoint,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
)

pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

pipe = pipeline.to("cuda")
pipe.enable_model_cpu_offload()

pipe.load_lora_weights("models/LoRAs/30.safetensors")

Then, I received the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[7], line 2
      1 with torch.inference_mode():
----> 2     pipe.load_lora_weights("/home/xxx/models/Flux-1.dev-Q4/30.safetensors")

File ~/Services/FluxLoRAQuantitativeQuest/.venv/lib/python3.12/site-packages/diffusers/loaders/lora_pipeline.py:1550, in FluxLoraLoaderMixin.load_lora_weights(self, pretrained_model_name_or_path_or_dict, adapter_name, **kwargs)
   1543 transformer_norm_state_dict = {
   1544     k: state_dict.pop(k)
   1545     for k in list(state_dict.keys())
   1546     if "transformer." in k and any(norm_key in k for norm_key in self._control_lora_supported_norm_keys)
   1547 }
   1549 transformer = getattr(self, self.transformer_name) if not hasattr(self, "transformer") else self.transformer
-> 1550 has_param_with_expanded_shape = self._maybe_expand_transformer_param_shape_or_error_(
   1551     transformer, transformer_lora_state_dict, transformer_norm_state_dict
   1552 )
   1554 if has_param_with_expanded_shape:
   1555     logger.info(
   1556         "The LoRA weights contain parameters that have different shapes that expected by the transformer. "
   1557         "As a result, the state_dict of the transformer has been expanded to match the LoRA parameter shapes. "
   1558         "To get a comprehensive list of parameter names that were modified, enable debug logging."
   1559     )

File ~/Services/FluxLoRAQuantitativeQuest/.venv/lib/python3.12/site-packages/diffusers/loaders/lora_pipeline.py:2020, in FluxLoraLoaderMixin._maybe_expand_transformer_param_shape_or_error_(cls, transformer, lora_state_dict, norm_state_dict, prefix)
   2017 parent_module = transformer.get_submodule(parent_module_name)
   2019 with torch.device("meta"):
-> 2020     expanded_module = torch.nn.Linear(
   2021         in_features, out_features, bias=bias, dtype=module_weight.dtype
   2022     )
   2023 # Only weights are expanded and biases are not. This is because only the input dimensions
   2024 # are changed while the output dimensions remain the same. The shape of the weight tensor
   2025 # is (out_features, in_features), while the shape of bias tensor is (out_features,), which
   2026 # explains the reason why only weights are expanded.
   2027 new_weight = torch.zeros_like(
   2028     expanded_module.weight.data, device=module_weight.device, dtype=module_weight.dtype
   2029 )

File ~/Services/FluxLoRAQuantitativeQuest/.venv/lib/python3.12/site-packages/torch/nn/modules/linear.py:105, in Linear.__init__(self, in_features, out_features, bias, device, dtype)
    103 self.in_features = in_features
    104 self.out_features = out_features
--> 105 self.weight = Parameter(
    106     torch.empty((out_features, in_features), **factory_kwargs)
    107 )
    108 if bias:
    109     self.bias = Parameter(torch.empty(out_features, **factory_kwargs))

File ~/Services/FluxLoRAQuantitativeQuest/.venv/lib/python3.12/site-packages/torch/nn/parameter.py:46, in Parameter.__new__(cls, data, requires_grad)
     42     data = torch.empty(0)
     43 if type(data) is torch.Tensor or type(data) is Parameter:
     44     # For ease of BC maintenance, keep this path for standard Tensor.
     45     # Eventually (tm), we should change the behavior for standard Tensor to match.
---> 46     return torch.Tensor._make_subclass(cls, data, requires_grad)
     48 # Path for custom tensors: set a flag on the instance to indicate parameter-ness.
     49 t = data.detach().requires_grad_(requires_grad)

RuntimeError: Only Tensors of floating point and complex dtype can require gradients

How should I load my LoRA? I’m really looking forward to your reply!

Hello, Did you get it?

Hello, Did you get it?

I have three solutions, but I don't find either of them particularly elegant. Here are my approaches:

  1. Loading via diffusers library (Transformer + Diffusion):

    import torch
    from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig
    
    pipe = FluxPipeline.from_pretrained(
        "black-forest-labs/FLUX.1-dev", 
        torch_dtype=torch.bfloat16,
        local_files_only=True,
    )
    pipe.transformer = FluxTransformer2DModel.from_single_file(
        "https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-Q2_K.gguf",
        quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
        torch_dtype=torch.bfloat16,
    )
    pipe.load_lora_weights("LoRA/xxxxx.safetensors")
    pipe = pipe.to("cuda")
    

    This method still requires you to download the complete "black-forest-labs/FLUX.1-dev" Checkpoint, which consumes a lot of memory space and loads very slowly.

  2. Loading using ComfyUI:

    • Load the GGUF in ComfyUI and ensure it can correctly output images.
    • Export the ComfyUI Workflow as a Python Script using the atmaranto/ComfyUI-SaveAsScript library.
    • You can modify this Python script. Based on my testing, the exported script can properly load GGUF and supports multiple runs on a server without memory issues.
    • You might need to manually debug the input and output formats of each Node, as its default output format is inconsistent with Pillow's order.
  3. Assembling the pre-release binary version of leejet/stable-diffusion.cpp via the command line (This is very slow on my device).


For my personal API service, I currently deploy using Method 1. This method allows for convenient loading and unloading of models. However, for more complex functionalities, such as loading multiple LoRAs with different weights or using other intricate quantized versions, only ComfyUI seems to handle it.

I've looked into a lot of resources, but I still haven't fully figured out how to manage transformers more conveniently and efficiently. I hope my methods provide some inspiration for you, and I also hope that if you or anyone else seeing this message has better methods, you can share them. Thanks!

Sign up or log in to comment