Yinxu Pan

cppowboy

https://github.com/Cppowboy

AI & ML interests

RL for LLM, Code&Math Reasoning, Function Calling, Code Interpreter, Vision-Language Pretraining

Recent Activity

upvoted a paper 4 days ago

SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

liked a dataset 5 days ago

mercor/APEX-SWE

liked a dataset 7 days ago

mercor/apex-agents

View all activity

Organizations

upvoted a paper 4 days ago

SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

Paper • 2603.24755 • Published 5 days ago • 24

liked a dataset 5 days ago

mercor/APEX-SWE

Updated 6 days ago • 3.55k • 20

liked a dataset 7 days ago

mercor/apex-agents

Viewer • Updated 28 days ago • 480 • 43.4k • 105

upvoted a paper 7 days ago

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Paper • 2603.21065 • Published 9 days ago • 75

New activity in Qwen/Qwen3.5-397B-A17B 7 days ago

Can not reproduce evaluation results on SWE-Verified

#63 opened 19 days ago by

cppowboy

upvoted a paper 11 days ago

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Paper • 2603.17187 • Published 13 days ago • 134

upvoted 5 papers 12 days ago

Online Experiential Learning for Language Models

Paper • 2603.16856 • Published 13 days ago • 57

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

Paper • 2603.16448 • Published 13 days ago • 58

InCoder-32B: Code Foundation Model for Industrial Scenarios

Paper • 2603.16790 • Published 13 days ago • 304

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Paper • 2603.15726 • Published 14 days ago • 183

SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?

Paper • 2603.15401 • Published 14 days ago • 18

New activity in GAIR/OpenSWE 12 days ago

Are these images publicly available?

#2 opened 12 days ago by

cppowboy

liked a dataset 14 days ago

GAIR/OpenSWE

Viewer • Updated 13 days ago • 45.3k • 1.61k • 16

upvoted a paper 15 days ago

daVinci-Env: Open SWE Environment Synthesis at Scale

Paper • 2603.13023 • Published 17 days ago • 30

liked a dataset 16 days ago

stepfun-ai/Step-3.5-Flash-SFT

Viewer • Updated 16 days ago • 1.62M • 50.3k • 288

liked a dataset 17 days ago

TIGER-Lab/WebInstruct-verified

Viewer • Updated Nov 27, 2025 • 462k • 369 • 67

upvoted 3 papers 18 days ago

In-Context Reinforcement Learning for Tool Use in Large Language Models

Paper • 2603.08068 • Published 21 days ago • 42

Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams

Paper • 2603.07392 • Published 23 days ago • 18

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published 20 days ago • 145

New activity in Qwen/Qwen3-Coder-Next 19 days ago

Amazing , it works with open claw

#39 opened about 1 month ago by

infinityai

Yinxu Pan

AI & ML interests

Recent Activity

Organizations

cppowboy's activity

Can not reproduce evaluation results on SWE-Verified

Are these images publicly available?

Amazing , it works with open claw