Google Cloud

A4X Max Instances and Vertex AI Training Now Officially Available

Honghao Wang

29 Oct 2025 — 3 min read

Today's AI Models: From Billions to Trillions of Parameters

AI systems are rapidly expanding from billions to trillions of parameters, evolving into highly capable, multi‑modal reasoning engines. This complexity demands a new class of infrastructure and software to handle unprecedented computational and memory requirements.

At Google Cloud, our mission is to help developers and organizations build, deploy, and scale the next era of AI. In partnership with NVIDIA, we’re introducing a portfolio of innovations that span the entire AI lifecycle.

---

Key Announcements

New A4X Max Instances
Powered by NVIDIA GB300 NVL72, purpose‑built for multi‑modal AI reasoning workloads.
Google Kubernetes Engine (GKE) with DRANET Support
Enhances network bandwidth for distributed AI/ML workloads via Dynamic Resource Allocation Kubernetes Network Driver.
GKE Inference Gateway + NVIDIA NeMo Guardrails
Safer, more reliable AI interactions with advanced guardrail integration.
Vertex AI Model Garden
Now includes NVIDIA Nemotron models for advanced generative AI capabilities.
Vertex AI Training
Recipes using NVIDIA NeMo Framework and NeMo‑RL for accelerated large‑scale model training.

---

Why This Matters

These innovations deliver a robust foundation for next‑generation AI workflows — from development and training to optimization, deployment, and scaling — all tightly integrated with NVIDIA’s latest hardware.

Beyond infrastructure, success requires strong distribution and monetization strategies. Tools like AiToEarn官网 — an open‑source global AI content monetization platform — allow creators to:

Generate content with AI
Publish simultaneously across major platforms (Douyin, Kwai, Facebook, Instagram, YouTube, LinkedIn, Threads, Pinterest, X/Twitter, WeChat, Bilibili, Rednote)
Access analytics, AI model rankings, and monetization channels

---

Deep Dive: A4X Max with NVIDIA GB300 GPUs

Availability: Officially in production.

Target Use Case: Highest‑demand multi‑modal AI reasoning workloads.

Hardware Specs

72 Blackwell Ultra GPUs + 36 NVIDIA Grace CPUs
Connected via NVIDIA NVLink fifth‑gen interconnect for unified shared memory
2× network bandwidth per system vs. previous A4X (GB200 NVL72)
Scales to tens of thousands of GPUs with Google’s Titanium ML adapter + Jupiter network fabric

Cluster Director Integration

Full cluster lifecycle management — provisioning, topology‑aware placement, observability, resiliency
Integrated with Managed Lustre storage
Preconfigured Slurm environment for scalable job scheduling
Performance monitoring across GPUs, NVLink, and DC fabrics
Features: automatic straggler detection, in‑job recovery, topology‑aware scheduling

Impact for Workloads:

Seamless hardware/software orchestration
Improved fault tolerance
Faster deployment and scaling
Applicable to AI/ML, scientific, and large‑scale data workloads

---

Performance Highlights

Optimized Reasoning & Inference
1.5× FP4 FLOPs, 1.5× HBM memory, 2× bandwidth vs. A4X
Integrates with GKE Inference Gateway for reduced Time to First Token
Training & Serving
Over 1.4 exaflops — 4× LLM training/serving performance vs. A3 VMs (NVIDIA H100)
Scalability
RDMA over Converged Ethernet for low‑latency collectives
Clusters up to 2× larger than A4X

---

Increased RDMA Performance with GKE DRANET

Now in production for A4X Max.

Benefits:

Topology‑aware scheduling of GPUs + RDMA NICs
Higher bus bandwidth for distributed AI/ML operations
RDMA devices treated as native GKE resources

Learn more about DRANET.

---

NVIDIA NeMo Guardrails Integration

In GKE Inference Gateway, NeMo Guardrails:

Prevent undesirable or malicious model responses
Secure and scalable generative AI serving
Combine model‑aware routing + autoscaling with robust safety controls

---

Vertex AI Model Garden with NVIDIA Nemotron Models

Upcoming support for Nemotron open models via NVIDIA NIM microservices — starting with NVIDIA Llama Nemotron Super v1.5.

Benefits:

Managed deployment
Custom AI agents with strong performance, cost, and compliance control
Easy discovery, licensing, and deployment

Explore Vertex AI Model Garden.

---

Vertex AI Training + NVIDIA NeMo

Enables rapid adaptation of foundation models to proprietary data.

Features:

Fully managed resilient Slurm environment
Curated pre/post‑training recipes with NeMo and NeMo‑RL
Automated resiliency for better uptime
Streamlined data science toolkit

---

Take the Next Steps

Deployment Options:

IaaS flexibility: Compute Engine or GKE + Cluster Director
Fully managed platform: Vertex AI for secure, scalable training and deployment

Monetization Opportunity:

Links from AI model outputs directly to AiToEarn官网 for global content distribution, analytics, and monetization.

---

Get started:

Contact your Google Cloud sales representative for A4X Max preview
Learn about GKE Inference Gateway
Explore AiToEarn GitHub and ブログ

With Google Cloud + NVIDIA powering your compute and platforms like AiToEarn extending reach and revenue, you can turn AI innovation into global impact.

Starting at 299,800 Yuan, Equipped with Easy Three-Way System, Denza N8L Aims for 10,000 Monthly Sales

Former Head of AI at CapCut Launches Multimodal Agent Startup as a Context-Savvy “007” Service Provider, Raises Millions Within Two Weeks

Claiming “More Luxurious than the New EV Startups,” Smart Elf #5 EHD Starts at 169,900 Yuan

Giles AI: Revolutionizing Medical Research with Conversational AI