Can Qin's picture

6 13 4

Can Qin

canqin001

·

https://canqin.tech/

AI & ML interests

None yet

Recent Activity

new activity 5 days ago

Salesforce/UniDoc-Bench:pdf corrupted and cannot be opened

upvoted a paper 5 days ago

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

authored a paper 8 days ago

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

View all activity

Organizations

authored 10 papers 8 days ago

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16, 2024 • 100

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22, 2024 • 36

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Paper • 2403.11299 • Published Mar 17, 2024 • 1

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Paper • 2410.16267 • Published Oct 21, 2024 • 18

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models

Paper • 2411.15024 • Published Nov 22, 2024 • 1

Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding

Paper • 2502.11492 • Published Feb 17 • 2

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Paper • 2505.21334 • Published May 27 • 21

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

Paper • 2507.04590 • Published Jul 7 • 16

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

Paper • 2507.20198 • Published Jul 27 • 26

authored 2 papers 9 days ago

CoDA: Coding LM via Diffusion Adaptation

Paper • 2510.03270 • Published 22 days ago • 40

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

Paper • 2510.03663 • Published 15 days ago • 15

authored a paper 7 months ago

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

Paper • 2503.16257 • Published Mar 20 • 25