Papers
arxiv:2603.28342

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

Published on Mar 30
ยท Submitted by
duhe
on Mar 31
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Kernel-Smith is a GPU kernel generation framework that combines evolutionary algorithms with post-training reinforcement learning to optimize performance across different hardware backends.

AI-generated summary

We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing and diverse programs together with structured execution feedback on compilation, correctness, and speedup. To make this search reliable, we build backend-specific evaluation services for Triton on NVIDIA GPUs and Maca on MetaX GPUs. On the training side, we convert long-horizon evolution trajectories into step-centric supervision and reinforcement learning signals by retaining correctness-preserving, high-gain revisions, so that the model is optimized as a strong local improver inside the evolutionary loop rather than as a one-shot generator. Under a unified evolutionary protocol, Kernel-Smith-235B-RL achieves state-of-the-art overall performance on KernelBench with Nvidia Triton backend, attaining the best average speedup ratio and outperforming frontier proprietary models including Gemini-3.0-pro and Claude-4.6-opus. We further validate the framework on the MetaX MACA backend, where our Kernel-Smith-MACA-30B surpasses large-scale counterparts such as DeepSeek-V3.2-think and Qwen3-235B-2507-think, highlighting potential for seamless adaptation across heterogeneous platforms. Beyond benchmark results, the same workflow produces upstream contributions to production systems including SGLang and LMDeploy, demonstrating that LLM-driven kernel optimization can transfer from controlled evaluation to practical deployment.

Community

Paper author Paper submitter
โ€ข
edited about 9 hours ago

Welcome to our new work: Kernel-Smith ๐Ÿš€.

We have broken through the bottleneck of traditional LLM operator generation and introduced the Evolutionary Agent mechanism. The core logic is very simple: let the model learn how to "improve better" in the evolutionary cycle.

  • How we work: we build stable evaluation services (NVIDIA & MetaX) to capture high-gain steps in the long-term evolutionary trajectory and convert them into reinforcement learning signals.
  • Our achievements: we achieved SOTA in KernelBench. Not only fast, but more importantly, Kernel-Smith had a high success rate in evolution.
  • Implementation status: We do not play with virtual metrics. The operators we have optimized have been directly contributed to SGLang and LMDeploy.

Welcome everyone to experience our Demo on the Project Page!

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.28342
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.28342 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.28342 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.28342 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.