6 34 7

Harold Chen

Harold328

https://haroldchen19.github.io/

HaroldChen19

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper about 21 hours ago

Spatia: Video Generation with Updatable Spatial Memory

liked a model 1 day ago

FastVideo/FastWan2.2-TI2V-5B-FullAttn-Diffusers

upvoted a paper 6 days ago

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

View all activity

Organizations

None yet

upvoted a paper about 21 hours ago

Spatia: Video Generation with Updatable Spatial Memory

Paper • 2512.15716 • Published 10 days ago • 20

upvoted a paper 6 days ago

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Paper • 2512.17909 • Published 8 days ago • 36

upvoted 4 papers 9 days ago

Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

Paper • 2512.16913 • Published 9 days ago • 33

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Paper • 2512.16915 • Published 9 days ago • 37

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Paper • 2512.13874 • Published 12 days ago • 16

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Paper • 2512.15702 • Published 10 days ago • 14

upvoted a paper 10 days ago

DEER: Draft with Diffusion, Verify with Autoregressive Models

Paper • 2512.15176 • Published 11 days ago • 41

upvoted a paper 11 days ago

A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

Paper • 2512.14442 • Published 11 days ago • 10

upvoted a paper 12 days ago

Memory in the Age of AI Agents

Paper • 2512.13564 • Published 12 days ago • 113

upvoted a paper 18 days ago

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

Paper • 2512.08294 • Published 19 days ago • 17

upvoted a paper 20 days ago

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Paper • 2512.04784 • Published 25 days ago • 24

upvoted a paper 23 days ago

4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer

Paper • 2512.05060 • Published 23 days ago • 18

upvoted a paper 25 days ago

DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

Paper • 2511.23127 • Published 29 days ago • 43

upvoted 3 papers 26 days ago

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Paper • 2512.00891 • Published 27 days ago • 14

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Paper • 2512.02014 • Published 26 days ago • 69

Video Generation Models Are Good Latent Reward Models

Paper • 2511.21541 • Published Nov 26 • 45

upvoted 2 papers about 1 month ago

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published Nov 24 • 27

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Paper • 2511.13704 • Published Nov 17 • 42

upvoted 2 papers 3 months ago

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Paper • 2510.09507 • Published Oct 10 • 10

Go with Your Gut: Scaling Confidence for Autoregressive Image Generation

Paper • 2509.26376 • Published Sep 30 • 9

Harold Chen

AI & ML interests

Recent Activity

Organizations

Harold328's activity