Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms Paper • 2603.28489 • Published 2 days ago • 25
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms Paper • 2603.28489 • Published 2 days ago • 25
view reply Thanks for your attention!We have tried two vision encoders and three LLMs. The best performances are achieved by integrating SigLIP-SO and Phi-2. We released all weights of combinations. Please refer to our GitHub for more information.