Robert Nishihara: Open-Source AI Computing Solution with Kubernetes, Ray, PyTorch, and vLLM

AI Compute Stacks for Emerging Workloads

AI workloads are rapidly increasing in complexity — both in computing power and data requirements.

Technologies like Kubernetes and PyTorch are critical in building production-ready AI systems capable of handling such demands.

At the KubeCon + CloudNativeCon North America 2025,

Robert Nishihara (Anyscale) shared insights on how a stack integrating:

can effectively power next-generation AI workloads.

---

Overview of Ray

Ray is an open-source framework designed for scaling machine learning and Python applications.

It originated from a reinforcement learning research project at Berkeley and now orchestrates infrastructure for distributed workloads.

Ray recently joined the PyTorch Foundation to deepen its role in the open-source AI ecosystem.

---

Drivers of Next-Generation AI Workloads

Nishihara identified three primary drivers:

  • Data Processing
  • Shift from traditional tabular data to multimodal datasets
  • (images, videos, audio, text, sensor data).
  • Multimodal datasets are essential for inference tasks in modern AI.
  • Model Training
  • Models are growing in size and complexity.
  • Training uses distributed CPU/GPU computing to accelerate development.
  • Model Serving
  • Efficient deployment at scale requires flexible frameworks.
  • Must support high-throughput, low-latency inference.

Key Trend:

Hardware requirements now must accommodate GPUs alongside CPUs.

Computing focus has shifted from “SQL ops on CPUs” to “inference ops on GPUs”.

---

Example: AiToEarn Platform

AiToEarn demonstrates how such stacks enable content creation and monetization:

  • Generates AI content
  • Publishes across multiple platforms
  • (Douyin, WeChat, Facebook, YouTube, etc.)
  • Offers analytics and AI model rankings
  • Fully open source

Purpose:

Connect AI tools, cross-platform publishing, analytics, and deployment — much like Kubernetes + PyTorch + Ray does for enterprise AI workloads.

---

Ray for Model Training and Inference

Model training includes:

  • Reinforcement Learning (RL) (more info)
  • Post-training dataset generation via inference

Using Ray’s Actor API:

  • An Actor is a stateful worker
  • Creates a worker class
  • Manages scheduling for methods on that instance

Performance Boost:

Ray supports RDMA for direct GPU memory transport → faster object transfers.

---

RL Frameworks Built on Ray

Examples include:

Training Engines:

Serving Engines:

  • Hugging Face, vLLM, SGLang, OpenAI

---

Architecture View — Top & Bottom Layers

Top Layers:

  • AI workloads
  • Model training/inference frameworks (PyTorch, vLLM, Megatron, SGLang)

Bottom Layers:

Bridge Layer:

  • Distributed compute frameworks (Ray, Spark)
  • Manage data ingestion and movement

---

Kubernetes + Ray: Complementary Roles

  • Kubernetes → container-level isolation
  • Ray → process-level isolation
  • Both provide vertical & horizontal autoscaling

Dynamic GPU Allocation:

  • Inference workloads fluctuate compared to training
  • Ray + Kubernetes → reallocate GPUs as needed

---

Essential Capabilities for AI Platforms

Nishihara emphasized:

  • Native multi-cloud support
  • Workload prioritization tied to GPU reservations
  • Observability & tooling at container, workload, and process levels
  • Model/data lineage tracking
  • Governance

Observability Tip:

Track object transfer speeds & performance across all levels.

---

AiToEarn — Open-Source AI Content Engine

For creators and AI engineers:

  • Multi-platform publishing
  • (Douyin, Kwai, WeChat, Bilibili, Rednote, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
  • AI content generation tools
  • Detailed analytics & model rankings
  • Governance-friendly workflow

Resources:

---

In Summary:

The integration of Kubernetes, PyTorch, vLLM, and Ray forms a powerful stack that can:

  • Train large-scale models efficiently
  • Serve them with low latency
  • Dynamically allocate compute resources
  • Enable both enterprise AI workloads and creative AI monetization platforms

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.