Congrats on this, and for supporting gguf models.. However, not loading correctly out of the box..
Haven't really jumped into debugging, just a quick check out, and attempt to load with llama-server, CLI.. as ..
/git/llama.cpp/build/bin/llama-server -c 0 --top-p 0.95 --temp 0.7 -ngl 2 -m /models/Magistral-Small-2506_gguf/ --host 0.0.0.0 --port 7070
results in ..
main: loading model
srv load_model: loading model '/models/Magistral-Small-2506_gguf/Magistral-Small-2506.gguf'
gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'
llama_model_load: error loading model: llama_model_loader: failed to load model from /models/Magistral-Small-2506_gguf/Magistral-Small-2506.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/models/Magistral-Small-2506_gguf/Magistral-Small-2506.gguf'
srv load_model: failed to load model, '/models/Magistral-Small-2506_gguf/Magistral-Small-2506.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
The 8bit quantized version also has the same issue..
For the record, if it is helpful.. attempted to run the automated conversion tool..
https://huggingface.co/spaces/ggml-org/gguf-my-repo
And result was..
.....
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 32768
INFO:hf-to-gguf:gguf: embedding length = 5120
INFO:hf-to-gguf:gguf: feed forward length = 32768
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 1000000000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message
Traceback (most recent call last):
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 1865, in set_vocab
self._set_vocab_sentencepiece()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 902, in _set_vocab_sentencepiece
tokens, scores, toktypes = self._create_vocab_sentencepiece()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 919, in _create_vocab_sentencepiece
raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: downloads/tmpxdyj7bte/Magistral-Small-2506/tokenizer.model
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 1868, in set_vocab
self._set_vocab_llama_hf()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 997, in _set_vocab_llama_hf
vocab = gguf.LlamaHfVocab(self.dir_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/llama.cpp/gguf-py/gguf/vocab.py", line 379, in init
with open(fname_tokenizer, encoding='utf-8') as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'downloads/tmpxdyj7bte/Magistral-Small-2506/tokenizer.json'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 6533, in
main()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 6527, in main
model_instance.write()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 404, in write
self.prepare_metadata(vocab_only=False)
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 517, in prepare_metadata
self.set_vocab()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 1871, in set_vocab
self._set_vocab_gpt2()
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 838, in _set_vocab_gpt2
tokens, toktypes, tokpre = self.get_vocab_base()
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/./llama.cpp/convert_hf_to_gguf.py", line 603, in get_vocab_base
tokenizer = AutoTokenizer.from_pretrained(self.dir_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 1032, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2025, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2063, in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2278, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 171, in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 198, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/sentencepiece/init.py", line 961, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.12/lib/python3.11/site-packages/sentencepiece/init.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: not a string
Hi ! Unfortunately I cannot reproduce your issue, did you try to point to the actual file instead of the folder in your command ?
Also maybe try to update llama.ccp if you haven't.
Unfortunately a very DOH! moment, I never checked the integrity of the model after download :( Redownloading now, and will report back.
And yes.. this was my fault.. Closing