Skip to content

AI-native Systems Research

AI-Generated Kernels

AI-native-Systems-Research/ai-native-systems-research

AI-Generated Kernels for GPUs and Accelerators¶

Compute kernels for accelerators like GPUs are at the heart of AI workloads. These kernels must be optimized for specific model families, data layouts, mathematical representations, and hardware models. Today, this optimization is a manual, expert-driven process. AI-Native Systems change this fundamentally.

The Opportunity¶

In a production inference environment, kernel usage varies over time. Some kernels are heavily exercised while others go unused. When new models arrive or workload patterns shift, kernels that were once idle become critical — and may not be optimized for their new usage profile.

An AI-native approach continuously observes kernel execution on the accelerator. When it detects changes in usage patterns — a kernel seeing new dimensions, increased frequency, or novel access patterns — it hypothesizes that there is room for improvement, even without an SLO violation.

How It Works¶

graph TD
    A[Observe kernel execution] --> B[Detect usage changes]
    B --> C[Hypothesize optimization opportunity]
    C --> D[Generate kernel variants]
    D --> E[Experiment & benchmark]
    E --> F{Improvement validated?}
    F -->|Yes| G[Deploy optimized kernel]
    F -->|No| H[Refine hypothesis]
    H --> D
    G --> A

The system uses LLM-based evolutionary approaches, reinforcement learning, or hybrid methods to generate and refine kernel variants. Each candidate is benchmarked against the production workload profile, and only validated improvements are deployed.

Key Properties¶

Workload-aware — kernels optimized for actual usage, not synthetic benchmarks
Continuous — optimization runs alongside production, not as a one-time effort
Environment-specific — each deployment gets kernels tuned for its particular hardware and workload mix
Human-optional — the full loop from detection to deployment can run autonomously, with human gating as desired

This builds on evolutionary code improvement techniques demonstrated by systems such as Sky-Discover, Kernel Bench, and OpenEvolve. AI-Native Systems generalize these ideas into a continuous, governed loop.

We use cookieless Google Analytics to count how many readers each post gets — no cookies, no tracking across sites. Your page URL (without query parameters), browser, and approximate location may be processed. Read what's collected →