27 6

shipeng luo

luoagent

AI & ML interests

ML AI

Recent Activity

upvoted an article 1 day ago

使用 DPO 微调 Llama 2

upvoted a paper 2 days ago

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

upvoted a paper 2 days ago

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

View all activity

Organizations

None yet

upvoted an article 1 day ago

Article

使用 DPO 微调 Llama 2

Aug 8, 2023

•

upvoted 8 papers 2 days ago

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Paper • 2603.17051 • Published 13 days ago • 106

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published 7 days ago • 27

Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs

Paper • 2603.22446 • Published 7 days ago • 6

MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

Paper • 2603.18718 • Published 11 days ago • 8

MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models

Paper • 2603.25744 • Published 4 days ago • 9

upvoted 11 papers 3 days ago

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Paper • 2603.23483 • Published 6 days ago • 58

Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published about 1 month ago • 61

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Paper • 2603.21065 • Published 9 days ago • 75

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published 5 days ago • 117

Mixture-of-Depths Attention

Paper • 2603.15619 • Published 14 days ago • 79

Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

Paper • 2603.16932 • Published 16 days ago • 84

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published 24 days ago • 117

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published 20 days ago • 145

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published 14 days ago • 148

Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published 19 days ago • 151

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Paper • 2603.02138 • Published 28 days ago • 150

shipeng luo

AI & ML interests

Recent Activity

Organizations

luoagent's activity

使用 DPO 微调 Llama 2