Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published 9 days ago • 105
Geometrically-Constrained Agent for Spatial Reasoning Paper • 2511.22659 • Published 29 days ago • 40
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing Paper • 2510.25590 • Published Oct 29 • 27
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 139
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis Paper • 2509.10441 • Published Sep 12 • 30
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 259
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data Paper • 2507.07095 • Published Jul 9 • 55
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model Paper • 2410.13925 • Published Oct 17, 2024 • 24
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published Jun 12, 2024 • 41
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images Paper • 2403.11703 • Published Mar 18, 2024 • 17
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Paper • 2401.16420 • Published Jan 29, 2024 • 55
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26, 2024 • 37
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31, 2024 • 567
Tiny LVLM-eHub: Early Multimodal Experiments with Bard Paper • 2308.03729 • Published Aug 7, 2023 • 10
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection Paper • 2307.14620 • Published Jul 27, 2023 • 14