ValueError: Invalid image type. Expected either PIL.Image.Image, numpy.ndarray, torch.Tensor, tf.Tensor or jax.ndarray, but got <class 'list'>.
#44
by
do-me
- opened
Running on Linux, CPU-only, python 3.13., I get this error when running with onnxruntime
ValueError: Invalid image type. Expected either PIL.Image.Image, numpy.ndarray, torch.Tensor, tf.Tensor or jax.ndarray, but got <class 'list'>.
import onnxruntime as ort
from transformers import AutoImageProcessor, AutoTokenizer
# Load tokenizer and image processor using transformers
tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-clip-v2', trust_remote_code=True)
image_processor = AutoImageProcessor.from_pretrained(
'jinaai/jina-clip-v2', trust_remote_code=True
)
# Corpus
sentences = [
'غروب جميل على الشاطئ', # Arabic
'海滩上美丽的日落', # Chinese
'Un beau coucher de soleil sur la plage', # French
'Ein wunderschöner Sonnenuntergang am Strand', # German
'Ένα όμορφο ηλιοβασίλεμα πάνω από την παραλία', # Greek
'समुद्र तट पर एक खूबसूरत सूर्यास्त', # Hindi
'Un bellissimo tramonto sulla spiaggia', # Italian
'浜辺に沈む美しい夕日', # Japanese
'해변 위로 아름다운 일몰', # Korean
]
# Public image URLs or PIL Images
image_urls = ['https://i.ibb.co/nQNGqL0/beach1.jpg', 'https://i.ibb.co/r5w8hG8/beach2.jpg']
# Tokenize input texts and transform input images
input_ids = tokenizer(sentences, return_tensors='np')['input_ids']
pixel_values = image_processor(image_urls)['pixel_values']
# Start an ONNX Runtime Session
session = ort.InferenceSession('jina-clip-v2/onnx/model.onnx')
# Run inference
output = session.run(None, {'input_ids': input_ids, 'pixel_values': pixel_values})
# Keep the normalised embeddings, first 2 outputs are un-normalized
_, _, text_embeddings, image_embeddings = output