HAMi 2.7.0 Major Release | Broader Heterogeneous Chip Support, More Stable Scheduling, Stronger Ecosystem

HAMi 2.7.0 Major Release | Broader Heterogeneous Chip Support, More Stable Scheduling, Stronger Ecosystem
# **HAMi v2.7.0 — Unified GPU Scheduling Across Heterogeneous Hardware**

> **Source:** Dynamia Melon Intelligence

---

## **Of Silicon & Scheduling — Connecting All Chips Through One Order**

A tribute to *Kubernetes 1.34*’s *Of Wind & Will*. There, the path is defined by *wind* and *will*; here, we navigate by **silicon** and **order**.

- **Silicon** — The computing essence, varied in form and temperament.  
- **Order** — The human-imposed rhythm and structure, which enables navigation through complexity.

When diverse chips converge on the same sea, we cannot predict wind direction, but we **guarantee an order capable of carrying them forward**. Releases emerge not because perfection is achieved, but because **order lets imperfection run in parallel**.

---

## **Release Overview**

We’re proud to announce the **HAMi v2.7.0** release — delivering:

- **Broader hardware ecosystem coverage**
- **Core scheduler optimizations**
- **Critical stability improvements**
- **Application ecosystem integrations**
- **Community growth**
- **WebUI enhancements**

![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-372.jpg)

---

## **Highlights at a Glance**

### **Hardware Ecosystem**
- Support for **Kunlun XPU, Suiyuan GCU, AWS Neuron**
- **Metax**: sGPU sharing (compute/memory) + 3 QoS modes
- **MetaXLink**: Topology awareness + intelligent scheduling
- NVIDIA GPU topology scheduling upgrade

### **Scheduler Optimizations**
- Scheduling failure event aggregation
- NVIDIA abnormal card handling
- Extended **ResourceQuota** — accurate quota calculation for multi-GPU requests
- Improved **observability & robustness**

### **Application Integrations**
- **vLLM** compatibility enhancements
- **Xinference** Helm Chart integration (HAMi vGPU support)
- **Volcano Dynamic MIG** support

### **Community**
- New contributors, reviewers, maintainers
- CNCF case studies and talks demonstrating broad adoption

### **WebUI Enhancements**
- End-to-end monitoring integration for **Metax**
- Better heterogeneous metric visualization

---

## **Community Updates**

### CNCF Case Studies
- **SF Technology (Effective GPU)**: Large-scale heterogeneous compute pooling & scheduling  
  → [Case study 1]
- **PREP-EDU**: Improved AI training efficiency in educational platforms  
  → [Case study 2]

### Recognitions
- **vCluster** technical seminar: Praised for proxy-layer CUDA API interception for fine-grained governance  
  → Replay [3]
- **Linux Foundation AI_dev Summit**: Showcased flexible GPU slicing and software-defined isolation  
  → Replay [4]
- **Vietnam Telecom**: GPU + eBPF for Kubernetes observability and management  
  → [Meetup 5] & [YouTube 6]

---

# **Feature Deep Dive**

---

## **MetaX — sGPU Sharing, QoS Management, Topology-Aware Scheduling**

**Key Features:**
1. **GPU Sharing (sGPU)** — Multiple containers share a physical GPU.
2. **Resource Isolation** — Limit GPU memory and compute cores per task.
3. **Topology Awareness** — Prefers high-bandwidth GPU groups (MetaXLink, PCIe Switch).
4. **QoS Policies** — **BestEffort**, **FixedShare**, **BurstShare**.
5. **Health & Monitoring** — WebUI integration with clear cluster-wide metrics.

### **Topology Optimization Principle**
**Goal**: Efficient multi-GPU job execution within high-speed interconnect groups.

**Two-stage decision:**
- **Stage 1 — Intra-node selection (priority rules)**:
  - Group GPUs by linkZone ID.
  - Highest priority: All needed GPUs within same linkZone.
  - Next: Cross-domain group, then fallback to unknown topology.
- **Stage 2 — Inter-node scoring**:

Final Score = (10 * allocatedScore) - lossScore

  - `allocatedScore` → Internal connection tightness.
  - `lossScore` → Future topology preservation penalty.

**Modes:**
- `binpack` → Minimize topology damage.
- `spread` → Maximize bandwidth grouping.

Example: binpack

apiVersion: v1

kind: Pod

metadata:

name: gpu-pod1

annotations:

hami.io/node-scheduler-policy: "binpack"

spec:

containers:

  • name: ubuntu-container
  • image: cr.metax-tech.com/public-ai-release/c500/colossalai:...
  • resources:
  • limits:
  • metax-tech.com/gpu: 1
  • apiVersion: v1
  • kind: Pod
  • metadata:
  • name: gpu-pod
  • annotations:
  • metax-tech.com/sgpu-qos-policy: "best-effort"
  • spec:
  • containers:
  • name: ubuntu-container
  • image: ubuntu:22.04
  • resources:
  • limits:
  • metax-tech.com/sgpu: 1
  • metax-tech.com/vcore: 60
  • metax-tech.com/vmemory: 4
  • apiVersion: v1
  • kind: Pod
  • metadata:
  • name: kunlun-vxpu-demo
  • annotations:
  • hami.io/use-xpu-uuid: "KL-UUID-01,KL-UUID-03"
  • spec:
  • containers:
  • name: my-app
  • resources:
  • limits:
  • kunlunxin.com/vxpu: 1
  • kunlunxin.com/vxpu-memory: 24576
  • apiVersion: v1
  • kind: Pod
  • metadata:
  • name: nuropod
  • spec:
  • containers:
  • name: nuropod
  • resources:
  • limits:
  • aws.amazon.com/neuroncore: 1
  • apiVersion: v1
  • kind: Pod
  • metadata:
  • name: gcu-shared-pod-with-uuid
  • annotations:
  • enflame.com/use-gpuuuid: "node1-enflame-0"
  • spec:
  • containers:
  • name: my-app
  • resources:
  • limits:
  • enflame.com/vgcu: 1
  • enflame.com/vgcu-percentage: 25
  • apiVersion: v1
  • kind: Pod
  • metadata:
  • name: gpu-topology-aware-job
  • annotations:
  • hami.io/gpu-scheduler-policy: "topology-aware"
  • spec:
  • containers:
  • name: cuda-container
  • resources:
  • limits:
  • nvidia.com/gpu: "4"
  • apiVersion: v1
  • kind: ResourceQuota
  • metadata:
  • name: gpu-quota
  • spec:
  • hard:
  • limits.nvidia.com/gpu: "2"
  • limits.nvidia.com/gpumem: "3000"
  • apiVersion: v1
  • kind: Pod
  • metadata:
  • name: gpu-pod1
  • annotations:
  • volcano.sh/vgpu-mode: "mig"
  • spec:
  • containers:
  • name: ubuntu-container
  • resources:
  • limits:
  • volcano.sh/vgpu-number: 1
  • volcano.sh/vgpu-memory: 8000

---

## **Acknowledgements**
Thanks to all community developers and hardware vendors involved: **@Kyrie336**, **@darker-thanBlack**, **@ouyangluwei163**, **@FouoF**, **@archlitchi**, **@zhaikangqi331**, **@lengrongfu**, **@fyp711**, **@Wangmin362**, **@andresd95**, **@calvin0327**, and teams from **MetaX**, **Baidu Kunlun**, **AWS Neuron**, **Enflame**, **NVIDIA**.

---

---

### **Aggregated Scheduling Events**
- Counts filtering failure reasons (e.g., `CardInsufficientMemory`)
- Writes results into warning events
- Success path → logs matched nodes & scores

---

## **Application Ecosystem**

### **vLLM**
- Async GPU memory allocation fix
- Accurate memory statistics
- NCCL comms optimization
- Native HAMi resource variable support

### **Xinference**
- HAMi vGPU integration in Helm Chart
- Passes `gpucores` and `gpumem-percentage` to Supervisor/Worker

---

## **Volcano Dynamic MIG**
- Dynamic MIG partitioning
- Best-Fit sizing
- Supports `vgpu-number`, `vgpu-memory`, `vgpu-cores`

---

## **Core Scheduler Optimizations**

### **Extended ResourceQuota**
- **Understands resource associations**
- **Handles dynamic percentage-based resource requests**

---

## **NVIDIA GPU — NVLink/PCIe Topology Scheduling**

**Workflow:**
1. **Topology Registration**
   - Detect NVLink vs PCIe connections via NVML
   - Build and annotate “connection matrix”
2. **Scheduling Decision**
   - Filter by NVLink group availability
   - Score for best fit or minimal topology damage

---

## **Enflame GCU — gcushare & Percentage Slicing**

**Features:**
- vGCU sharing
- Percentage-based compute/memory allocation
- Device UUID targeting

---

## **AWS Neuron — Core-Level Scheduling**

**Features:**
- Request at NeuronCore granularity
- Topology-aware contiguous allocation
- Simplified user resource declaration

---

## **Kunlun XPU — vXPU Partitioning & Topology Awareness**

**Features:**
- Hybrid full-card & vXPU deployment
- Auto memory alignment to hardware specs
- Topology-aware scheduling (leftwing/rightwing grouping)
- vXPU UUID whitelist/blacklist for fine-grained control

---

### **QoS Scheduling Logic**
- First Pod sets GPU QoS “identity” — subsequent tasks must match.
- Idle GPU → Any QoS allowed.
- In-use GPU → Only matching QoS tasks scheduled.

Read more