GGUF model with architecture gemma3 is not supported yet

#2
by kieransmith - opened

I'm using the following code to try and get this working:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "ZeroWw/gemma-3-4b-it-abliterated-GGUF"
filename = "gemma-3-4b-it-abliterated.q8q4.gguf"

torch_dtype = torch.float16
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename, torch_dtype=torch_dtype)

inputs = tokenizer.encode("Test message", return_tensors='pt')

outputs = model.generate(inputs, max_length=50, num_return_sequences=1)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(text)

But I get the following error:

Traceback (most recent call last):
  File "...", line 8, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
  File ".../Library/Python/3.9/lib/python/site-packages/transformers/models/auto/tokenization_auto.py", line 927, in from_pretrained
    config_dict = load_gguf_checkpoint(gguf_path, return_tensors=False)["config"]
  File ".../Library/Python/3.9/lib/python/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 401, in load_gguf_checkpoint
    raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma3 is not supported yet.

Are you able to help point me in the right direction with this please?

Owner

I use the quants with llama.cpp / koboldcpp.

ZeroWw changed discussion status to closed

Same error for me.
What is the solution?

Owner

It works with llama.cpp and kobold.cpp

ZeroWw changed discussion status to open
ZeroWw changed discussion status to closed

Sign up or log in to comment