ByteDance/MTVQA
Viewer
•
Updated
•
8.79k
•
149
•
41
Generate speaker‑labeled transcript from an audio file
Generate captions for your images instantly
Analyze images and videos with various tasks like captioning, detection, and OCR