40 76 40

Qinghong (Kevin) Lin

KevinQHLin

http://qhlin.me/

AI & ML interests

Vision-Language Model, Video Understanding, Agent

Recent Activity

upvoted a paper 9 days ago

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

authored a paper 9 days ago

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

liked a dataset 10 days ago

markov-ai/computer-use-large

View all activity

Organizations

authored a paper 9 days ago

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Paper • 2604.07429 • Published 17 days ago • 114

authored a paper 29 days ago

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Paper • 2603.24440 • Published about 1 month ago • 98

authored 2 papers 2 months ago

Learning Video Context as Interleaved Multimodal Sequences

Paper • 2407.21757 • Published Jul 31, 2024

Code2World: A GUI World Model via Renderable Code Generation

Paper • 2602.09856 • Published Feb 10 • 202

authored 2 papers 3 months ago

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

Paper • 2601.03928 • Published Jan 7 • 16

ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands

Paper • 2512.24965 • Published Dec 31, 2025 • 43

submitted a paper to Daily Papers 3 months ago

ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

Paper • 2512.24965 • Published Dec 31, 2025 • 43

authored a paper 4 months ago

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Paper • 2512.13281 • Published Dec 15, 2025 • 65

submitted a paper to Daily Papers 4 months ago

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Paper • 2512.13281 • Published Dec 15, 2025 • 65

authored 2 papers 4 months ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Paper • 2503.15661 • Published Mar 19, 2025 • 3

authored 2 papers 5 months ago

Computer-Use Agents as Judges for Generative User Interface

Paper • 2511.15567 • Published Nov 19, 2025 • 54

Grounding Computer Use Agents on Human Demonstrations

Paper • 2511.07332 • Published Nov 10, 2025 • 107

authored a paper 6 months ago

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published Nov 4, 2025 • 103

authored 2 papers 7 months ago

Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6, 2025 • 120

Code2Video: A Code-centric Paradigm for Educational Video Generation

Paper • 2510.01174 • Published Oct 1, 2025 • 35

authored a paper 8 months ago

Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11, 2025 • 30

authored 2 papers 11 months ago

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Paper • 2505.21497 • Published May 27, 2025 • 109

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22, 2025 • 11

authored a paper about 1 year ago

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

Paper • 2503.13444 • Published Mar 17, 2025 • 20

Qinghong (Kevin) Lin

AI & ML interests

Recent Activity

Organizations

KevinQHLin's activity