KubeCon NA 2025: Erica Hughberg and Alexa Griffith on GenAI Tools

Building Scalable Generative AI Platforms: Key Insights from KubeCon + CloudNativeCon NA 2025

Generative AI technologies must adapt to new workloads, address unique traffic patterns, and leverage advanced infrastructure requirements. This demands a fresh toolbox tailored to the GenAI era.

Last week, at the KubeCon + CloudNativeCon North America 2025 Conference, Erica Hughberg (Tetrate) and Alexa Griffith (Bloomberg) shared what it takes to design and deploy GenAI platforms that can deliver model inference at scale.

---

New Requirements for GenAI Applications

Next-generation AI workloads need capabilities beyond traditional platforms:

  • Dynamic, model-based routing – Intelligently direct requests to the most appropriate model instance.
  • Token-level rate limiting – Control usage at the token level rather than per request.
  • Secure & centralized credential management – Safely store and distribute API keys and secrets.
  • Observability, resilience, and failover – Ensure robust monitoring and automatic recovery for AI workloads.

> Traditional platforms often fall short because they lack AI-native logic, rely only on basic rate limiting, and use request-based routing unsuitable for many AI patterns.

---

Core Tools for GenAI Infrastructure

You can build modern GenAI infrastructure within the Kubernetes ecosystem using:

Model Serving & Deployment

  • KServe – Serverless model serving with a declarative API.
  • vLLM – Optimized inference serving for large language models.
  • llm-d – Kubernetes‑native distributed LLM serving.

Gateway & Traffic Management

  • Envoy – Programmable gateway for routing and policy enforcement.
  • Envoy AI Gateway – Specialized gateway for AI workloads.

Observability

---

Example Architecture: Envoy AI Gateway + KServe

Envoy AI Gateway

Designed to operate at the edge, Envoy AI Gateway controls application traffic to GenAI services such as Inference Service or Model Context Protocol (MCP).

Two-tier gateway pattern:

  • Tier One Gateway (AI Gateway)
  • Centralized entry point
  • Handles authentication, unified LLM API, token-based rate limiting
  • Can proxy MCP servers
  • Tier Two Gateway (Reference Gateway)
  • Manages ingress traffic inside Kubernetes clusters
  • Provides fine-grained access control to models

Supported providers include OpenAI, Azure OpenAI, Google Gemini, Vertex AI, AWS Bedrock, and Anthropic.

KServe Enhancements for GenAI

  • Multi-framework LLM support
  • OpenAI-compatible APIs
  • LLM model caching & KV cache offloading
  • Multi-node inference & metric-based autoscaling
  • Native Hugging Face integration with simplified deployment

KServe uses a Kubernetes Custom Resource Definition (CRD) built on llm-d, supporting PyTorch, TensorFlow, ONNX, and HuggingFace models.

Within the CRD YAML, the `InferenceService` resource defines model metadata and configures external API access.

---

Integrating Monetization & Content Workflows

Platforms like AiToEarn官网 show how open-source ecosystems can:

  • Unify AI content generation tools
  • Manage cross-platform publishing
  • Provide analytics and AI model ranking (AI模型排名)
  • Enable monetization across platforms like Douyin, Kwai, WeChat, Bilibili, Rednote (Xiaohongshu), Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter)

AiToEarn complements Kubernetes-based AI services by:

  • Extending reach directly to audiences
  • Integrating analytics into content workflows
  • Reducing operational overhead while increasing revenue opportunities

---

Key Takeaways

Hughberg and Griffith emphasized that GenAI workloads are:

  • Stateful
  • Resource-intensive
  • Token-based

Meeting these challenges requires AI-native features:

  • Dynamic, model-driven routing
  • Token-level rate limiting
  • Integrated cost control

With open-source CNCF tools like Kubernetes, Envoy AI Gateway, and KServe, teams can build reliable, scalable GenAI applications. For broader market engagement, integrations with platforms like AiToEarn can bridge infrastructure deployment and content monetization, offering a complete end-to-end solution.

---

Next Steps for Your Team:

  • Evaluate your AI workload’s routing, rate-limiting, and observability needs.
  • Prototype with Envoy AI Gateway + KServe for model serving and traffic control.
  • Integrate analytics and publishing workflows via AiToEarn or similar platforms.
  • Iterate on scalability and interoperability to support future growth.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.