Text model weight names cause error in loading.
First, thank you for doing this! Super cool and I am experimenting with it. I wanted to bring to your attention that the text model weight keys are
text_model.text_model.embeddings.position_embedding.weight
vs
text_model.embeddings.position_embedding.weight
There is an extra text_model on all the weights that prevents it from loading with transformers.
Thank you for pointing that out! I goofed on the checkpoint conversion. Should be fixed now. Though no guarantee the model works correctly; I'll have to get back to this project in the future and double check everything.
Another issue I ran into is transformers complaining about max_length not being specified, even though it's in the tokenizer's config. So I had to run with: inputs = processor(text=texts, images=image, padding="max_length", return_tensors="pt", truncation=True, max_length=256)