From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Paranioar
AI & ML interests
Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model
Recent Activity
upvoted
a
paper
3 days ago
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image
upvoted
a
paper
3 days ago
Simulating the Visual World with Artificial Intelligence: A Roadmap
upvoted
a
paper
22 days ago
Uniform Discrete Diffusion with Metric Path for Video Generation