Production AI

KubeCon NA 2025: Erica Hughberg and Alexa Griffith on GenAI Tools

Honghao Wang

18 Nov 2025 — 2 min read

Building Scalable Generative AI Platforms: Key Insights from KubeCon + CloudNativeCon NA 2025

Generative AI technologies must adapt to new workloads, address unique traffic patterns, and leverage advanced infrastructure requirements. This demands a fresh toolbox tailored to the GenAI era.

Last week, at the KubeCon + CloudNativeCon North America 2025 Conference, Erica Hughberg (Tetrate) and Alexa Griffith (Bloomberg) shared what it takes to design and deploy GenAI platforms that can deliver model inference at scale.

---

New Requirements for GenAI Applications

Next-generation AI workloads need capabilities beyond traditional platforms:

Dynamic, model-based routing – Intelligently direct requests to the most appropriate model instance.
Token-level rate limiting – Control usage at the token level rather than per request.
Secure & centralized credential management – Safely store and distribute API keys and secrets.
Observability, resilience, and failover – Ensure robust monitoring and automatic recovery for AI workloads.

> Traditional platforms often fall short because they lack AI-native logic, rely only on basic rate limiting, and use request-based routing unsuitable for many AI patterns.

---

Core Tools for GenAI Infrastructure

You can build modern GenAI infrastructure within the Kubernetes ecosystem using:

Model Serving & Deployment

KServe – Serverless model serving with a declarative API.
vLLM – Optimized inference serving for large language models.
llm-d – Kubernetes‑native distributed LLM serving.

Gateway & Traffic Management

Envoy – Programmable gateway for routing and policy enforcement.
Envoy AI Gateway – Specialized gateway for AI workloads.

Observability

OpenTelemetry – Unified metrics and tracing.
Prometheus – Metrics collection and alerting.
Grafana – Visualization dashboards.

---

Example Architecture: Envoy AI Gateway + KServe

Envoy AI Gateway

Designed to operate at the edge, Envoy AI Gateway controls application traffic to GenAI services such as Inference Service or Model Context Protocol (MCP).

Two-tier gateway pattern:

Tier One Gateway (AI Gateway)
Centralized entry point
Handles authentication, unified LLM API, token-based rate limiting
Can proxy MCP servers
Tier Two Gateway (Reference Gateway)
Manages ingress traffic inside Kubernetes clusters
Provides fine-grained access control to models

Supported providers include OpenAI, Azure OpenAI, Google Gemini, Vertex AI, AWS Bedrock, and Anthropic.

KServe Enhancements for GenAI

Multi-framework LLM support
OpenAI-compatible APIs
LLM model caching & KV cache offloading
Multi-node inference & metric-based autoscaling
Native Hugging Face integration with simplified deployment

KServe uses a Kubernetes Custom Resource Definition (CRD) built on llm-d, supporting PyTorch, TensorFlow, ONNX, and HuggingFace models.

Within the CRD YAML, the `InferenceService` resource defines model metadata and configures external API access.

---

Integrating Monetization & Content Workflows

Platforms like AiToEarn官网 show how open-source ecosystems can:

Unify AI content generation tools
Manage cross-platform publishing
Provide analytics and AI model ranking (AI模型排名)
Enable monetization across platforms like Douyin, Kwai, WeChat, Bilibili, Rednote (Xiaohongshu), Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter)

AiToEarn complements Kubernetes-based AI services by:

Extending reach directly to audiences
Integrating analytics into content workflows
Reducing operational overhead while increasing revenue opportunities

---

Key Takeaways

Hughberg and Griffith emphasized that GenAI workloads are:

Stateful
Resource-intensive
Token-based

Meeting these challenges requires AI-native features:

Dynamic, model-driven routing
Token-level rate limiting
Integrated cost control

With open-source CNCF tools like Kubernetes, Envoy AI Gateway, and KServe, teams can build reliable, scalable GenAI applications. For broader market engagement, integrations with platforms like AiToEarn can bridge infrastructure deployment and content monetization, offering a complete end-to-end solution.

---

Next Steps for Your Team:

Evaluate your AI workload’s routing, rate-limiting, and observability needs.
Prototype with Envoy AI Gateway + KServe for model serving and traffic control.
Integrate analytics and publishing workflows via AiToEarn or similar platforms.
Iterate on scalability and interoperability to support future growth.

KubeCon NA 2025: Erica Hughberg and Alexa Griffith on GenAI Tools

Honghao Wang

Building Scalable Generative AI Platforms: Key Insights from KubeCon + CloudNativeCon NA 2025

New Requirements for GenAI Applications

Core Tools for GenAI Infrastructure

Example Architecture: Envoy AI Gateway + KServe

Envoy AI Gateway

KServe Enhancements for GenAI

Integrating Monetization & Content Workflows

Key Takeaways

Read more

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

TopGear Picks 18 Cars of the Year, Only One from China