From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Paranioar
AI & ML interests
Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model
Recent Activity
upvoted
a
paper
about 9 hours ago
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
upvoted
a
paper
6 days ago
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition
upvoted
a
paper
17 days ago
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation