Jim White PRO
jimwhite
·
AI & ML interests
None yet
Recent Activity
updated a collection 2 days ago
Coding Benchmarks liked a Space 14 days ago
webml-community/Gemma-4-WebGPU updated a collection 28 days ago
Coding Benchmarks Organizations
RL
-
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper • 2512.17008 • Published • 11 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
ryokamoi/Qwen-2.5-7B-FoVer-PRM-old
Text Generation • 8B • Updated • 328 • 1 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42
PUP
-
DeepCode: Open Agentic Coding
Paper • 2512.07921 • Published • 34 -
nvidia/Nemotron-Pretraining-Code-v2
Viewer • Updated • 836M • 90.7k • 121 -
BEAVER: An Efficient Deterministic LLM Verifier
Paper • 2512.05439 • Published • 36 -
codefuse-ai/C2LLM-7B
Feature Extraction • 8B • Updated • 368 • 10
Verified Agents
Coding Benchmarks
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
Paper • 2511.05459 • Published • 4 -
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
Paper • 2512.18470 • Published • 12 -
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
Paper • 2601.09688 • Published • 127
Semantic Web
-
josancamon/kg-gen-MINE-evaluation-dataset
Viewer • Updated • 101 • 214 • 4 -
zilliz/semantic-highlight-bilingual-v1
Token Classification • Updated • 6.28k • 94 -
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
Paper • 2601.09688 • Published • 127
LLM
Verified Agents
RL
-
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper • 2512.17008 • Published • 11 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
ryokamoi/Qwen-2.5-7B-FoVer-PRM-old
Text Generation • 8B • Updated • 328 • 1 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42
Coding Benchmarks
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 304 -
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
Paper • 2511.05459 • Published • 4 -
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
Paper • 2512.18470 • Published • 12 -
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
Paper • 2601.09688 • Published • 127
PUP
-
DeepCode: Open Agentic Coding
Paper • 2512.07921 • Published • 34 -
nvidia/Nemotron-Pretraining-Code-v2
Viewer • Updated • 836M • 90.7k • 121 -
BEAVER: An Efficient Deterministic LLM Verifier
Paper • 2512.05439 • Published • 36 -
codefuse-ai/C2LLM-7B
Feature Extraction • 8B • Updated • 368 • 10
Semantic Web
-
josancamon/kg-gen-MINE-evaluation-dataset
Viewer • Updated • 101 • 214 • 4 -
zilliz/semantic-highlight-bilingual-v1
Token Classification • Updated • 6.28k • 94 -
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
Paper • 2601.09688 • Published • 127