GGUF model with architecture gemma3 is not supported yet
#2
by
kieransmith
- opened
I'm using the following code to try and get this working:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "ZeroWw/gemma-3-4b-it-abliterated-GGUF"
filename = "gemma-3-4b-it-abliterated.q8q4.gguf"
torch_dtype = torch.float16
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename, torch_dtype=torch_dtype)
inputs = tokenizer.encode("Test message", return_tensors='pt')
outputs = model.generate(inputs, max_length=50, num_return_sequences=1)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)
But I get the following error:
Traceback (most recent call last):
File "...", line 8, in <module>
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
File ".../Library/Python/3.9/lib/python/site-packages/transformers/models/auto/tokenization_auto.py", line 927, in from_pretrained
config_dict = load_gguf_checkpoint(gguf_path, return_tensors=False)["config"]
File ".../Library/Python/3.9/lib/python/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 401, in load_gguf_checkpoint
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture gemma3 is not supported yet.
Are you able to help point me in the right direction with this please?
I use the quants with llama.cpp / koboldcpp.
ZeroWw
changed discussion status to
closed
Same error for me.
What is the solution?
It works with llama.cpp and kobold.cpp
ZeroWw
changed discussion status to
open
ZeroWw
changed discussion status to
closed