ZeroWw/gemma-3-4b-it-abliterated-GGUF · GGUF model with architecture gemma3 is not supported yet

Mar 29

I'm using the following code to try and get this working:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "ZeroWw/gemma-3-4b-it-abliterated-GGUF"
filename = "gemma-3-4b-it-abliterated.q8q4.gguf"

torch_dtype = torch.float16
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename, torch_dtype=torch_dtype)

inputs = tokenizer.encode("Test message", return_tensors='pt')

outputs = model.generate(inputs, max_length=50, num_return_sequences=1)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(text)

But I get the following error:

Traceback (most recent call last):
  File "...", line 8, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
  File ".../Library/Python/3.9/lib/python/site-packages/transformers/models/auto/tokenization_auto.py", line 927, in from_pretrained
    config_dict = load_gguf_checkpoint(gguf_path, return_tensors=False)["config"]
  File ".../Library/Python/3.9/lib/python/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 401, in load_gguf_checkpoint
    raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma3 is not supported yet.

Are you able to help point me in the right direction with this please?

ZeroWw

Owner Apr 4

I use the quants with llama.cpp / koboldcpp.

ZeroWw changed discussion status to closed Apr 4

mnwato

May 8

Same error for me.
What is the solution?

ZeroWw

Owner May 11

It works with llama.cpp and kobold.cpp

ZeroWw changed discussion status to open May 11

ZeroWw changed discussion status to closed May 11