AI-Native Systems Research¶
What if a system could observe its own behavior, hypothesize improvements, validate them, and deploy — continuously, at machine speed?
The Vision¶
Modern software systems serving AI workloads are extraordinarily complex and must evolve under relentless pressure — new models, new hardware, changing usage patterns, shifting objectives. Today, even with powerful AI tools, every improvement is mediated by humans step by step. This human-mediated loop has become the bottleneck.
AI-Native Systems close this loop. In an AI-native System, AI is the primary agent of continuous creation, evolution, and operation. Humans define objectives, constraints, and governance — while the system continuously executes within those boundaries.
graph LR
O[Observe] --> R[Reason]
R --> C[Change]
C --> E[Experiment]
E --> V[Validate]
V --> D[Deploy]
D --> O
The continuous meta-loop: from observation to deployment, then back again.
Architecture¶
An AI-native system consists of two parts:
System Under Control¶
The software system that delivers business value and is subject to continuous evolution — inference platforms, kernel pipelines, storage systems. It is not necessarily an AI system itself.
Controlling System¶
The agentic AI-driven layer that continuously improves the System Under Control. It has two functions: a Reasoner (observes, hypothesizes, proposes goals) and a Changer (plans, experiments, produces artifacts).
Key Principles¶
- Continuous, proactive evolution — not just reactive to failures, but seeking latent optimization opportunities
- Governed autonomy — every change has complete provenance: what, why, and evidence
- Spec-driven development — specifications are live documents that evolve with the system
- Experimentation as a first-class activity — exploring a space of possibilities, not relying on single proposed fixes
- Hyper-specialization — systems optimized for how they are actually used in each specific deployment
- Simulation environments — enabling rapid evolution and verification when real‑world experimentation is too costly (e.g., BLIS, our high-fidelity and accurate llm-d simulator).
Technical Domains¶
We are applying the AI-native vision to three initial domains:
llm-d¶
A Kubernetes-native distributed LLM inference framework. AI-native approaches drive intelligent scheduling, KV-cache optimization, and continuous performance tuning.
AI-Generated Kernels¶
Autonomous generation and optimization of compute kernels for GPUs and accelerators — driven by workload observations, evolutionary techniques, and continuous experimentation.
Storage Systems¶
Applying spec-driven development and AI-native continuous improvement to storage infrastructure — enabling domain-specific, self-optimizing, and workload-aware storage systems.
Latest from the Blog¶
Check our blogs for latest posts on AI-Native Systems research, progress updates, and deep dives into specific domains.
AI-Native Systems Research · Apache 2.0