HAMi 2.7.0 Major Release | Broader Heterogeneous Chip Support, More Stable Scheduling, Stronger Ecosystem
# **HAMi v2.7.0 — Unified GPU Scheduling Across Heterogeneous Hardware**
> **Source:** Dynamia Melon Intelligence
---
## **Of Silicon & Scheduling — Connecting All Chips Through One Order**
A tribute to *Kubernetes 1.34*’s *Of Wind & Will*. There, the path is defined by *wind* and *will*; here, we navigate by **silicon** and **order**.
- **Silicon** — The computing essence, varied in form and temperament.
- **Order** — The human-imposed rhythm and structure, which enables navigation through complexity.
When diverse chips converge on the same sea, we cannot predict wind direction, but we **guarantee an order capable of carrying them forward**. Releases emerge not because perfection is achieved, but because **order lets imperfection run in parallel**.
---
## **Release Overview**
We’re proud to announce the **HAMi v2.7.0** release — delivering:
- **Broader hardware ecosystem coverage**
- **Core scheduler optimizations**
- **Critical stability improvements**
- **Application ecosystem integrations**
- **Community growth**
- **WebUI enhancements**

---
## **Highlights at a Glance**
### **Hardware Ecosystem**
- Support for **Kunlun XPU, Suiyuan GCU, AWS Neuron**
- **Metax**: sGPU sharing (compute/memory) + 3 QoS modes
- **MetaXLink**: Topology awareness + intelligent scheduling
- NVIDIA GPU topology scheduling upgrade
### **Scheduler Optimizations**
- Scheduling failure event aggregation
- NVIDIA abnormal card handling
- Extended **ResourceQuota** — accurate quota calculation for multi-GPU requests
- Improved **observability & robustness**
### **Application Integrations**
- **vLLM** compatibility enhancements
- **Xinference** Helm Chart integration (HAMi vGPU support)
- **Volcano Dynamic MIG** support
### **Community**
- New contributors, reviewers, maintainers
- CNCF case studies and talks demonstrating broad adoption
### **WebUI Enhancements**
- End-to-end monitoring integration for **Metax**
- Better heterogeneous metric visualization
---
## **Community Updates**
### CNCF Case Studies
- **SF Technology (Effective GPU)**: Large-scale heterogeneous compute pooling & scheduling
→ [Case study 1]
- **PREP-EDU**: Improved AI training efficiency in educational platforms
→ [Case study 2]
### Recognitions
- **vCluster** technical seminar: Praised for proxy-layer CUDA API interception for fine-grained governance
→ Replay [3]
- **Linux Foundation AI_dev Summit**: Showcased flexible GPU slicing and software-defined isolation
→ Replay [4]
- **Vietnam Telecom**: GPU + eBPF for Kubernetes observability and management
→ [Meetup 5] & [YouTube 6]
---
# **Feature Deep Dive**
---
## **MetaX — sGPU Sharing, QoS Management, Topology-Aware Scheduling**
**Key Features:**
1. **GPU Sharing (sGPU)** — Multiple containers share a physical GPU.
2. **Resource Isolation** — Limit GPU memory and compute cores per task.
3. **Topology Awareness** — Prefers high-bandwidth GPU groups (MetaXLink, PCIe Switch).
4. **QoS Policies** — **BestEffort**, **FixedShare**, **BurstShare**.
5. **Health & Monitoring** — WebUI integration with clear cluster-wide metrics.
### **Topology Optimization Principle**
**Goal**: Efficient multi-GPU job execution within high-speed interconnect groups.
**Two-stage decision:**
- **Stage 1 — Intra-node selection (priority rules)**:
- Group GPUs by linkZone ID.
- Highest priority: All needed GPUs within same linkZone.
- Next: Cross-domain group, then fallback to unknown topology.
- **Stage 2 — Inter-node scoring**:Final Score = (10 * allocatedScore) - lossScore
- `allocatedScore` → Internal connection tightness.
- `lossScore` → Future topology preservation penalty.
**Modes:**
- `binpack` → Minimize topology damage.
- `spread` → Maximize bandwidth grouping.
Example: binpack
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod1
annotations:
hami.io/node-scheduler-policy: "binpack"
spec:
containers:
- name: ubuntu-container
- image: cr.metax-tech.com/public-ai-release/c500/colossalai:...
- resources:
- limits:
- metax-tech.com/gpu: 1
- apiVersion: v1
- kind: Pod
- metadata:
- name: gpu-pod
- annotations:
- metax-tech.com/sgpu-qos-policy: "best-effort"
- spec:
- containers:
- name: ubuntu-container
- image: ubuntu:22.04
- resources:
- limits:
- metax-tech.com/sgpu: 1
- metax-tech.com/vcore: 60
- metax-tech.com/vmemory: 4
- apiVersion: v1
- kind: Pod
- metadata:
- name: kunlun-vxpu-demo
- annotations:
- hami.io/use-xpu-uuid: "KL-UUID-01,KL-UUID-03"
- spec:
- containers:
- name: my-app
- resources:
- limits:
- kunlunxin.com/vxpu: 1
- kunlunxin.com/vxpu-memory: 24576
- apiVersion: v1
- kind: Pod
- metadata:
- name: nuropod
- spec:
- containers:
- name: nuropod
- resources:
- limits:
- aws.amazon.com/neuroncore: 1
- apiVersion: v1
- kind: Pod
- metadata:
- name: gcu-shared-pod-with-uuid
- annotations:
- enflame.com/use-gpuuuid: "node1-enflame-0"
- spec:
- containers:
- name: my-app
- resources:
- limits:
- enflame.com/vgcu: 1
- enflame.com/vgcu-percentage: 25
- apiVersion: v1
- kind: Pod
- metadata:
- name: gpu-topology-aware-job
- annotations:
- hami.io/gpu-scheduler-policy: "topology-aware"
- spec:
- containers:
- name: cuda-container
- resources:
- limits:
- nvidia.com/gpu: "4"
- apiVersion: v1
- kind: ResourceQuota
- metadata:
- name: gpu-quota
- spec:
- hard:
- limits.nvidia.com/gpu: "2"
- limits.nvidia.com/gpumem: "3000"
- apiVersion: v1
- kind: Pod
- metadata:
- name: gpu-pod1
- annotations:
- volcano.sh/vgpu-mode: "mig"
- spec:
- containers:
- name: ubuntu-container
- resources:
- limits:
- volcano.sh/vgpu-number: 1
- volcano.sh/vgpu-memory: 8000
---
## **Acknowledgements**
Thanks to all community developers and hardware vendors involved: **@Kyrie336**, **@darker-thanBlack**, **@ouyangluwei163**, **@FouoF**, **@archlitchi**, **@zhaikangqi331**, **@lengrongfu**, **@fyp711**, **@Wangmin362**, **@andresd95**, **@calvin0327**, and teams from **MetaX**, **Baidu Kunlun**, **AWS Neuron**, **Enflame**, **NVIDIA**.
---
---
### **Aggregated Scheduling Events**
- Counts filtering failure reasons (e.g., `CardInsufficientMemory`)
- Writes results into warning events
- Success path → logs matched nodes & scores
---
## **Application Ecosystem**
### **vLLM**
- Async GPU memory allocation fix
- Accurate memory statistics
- NCCL comms optimization
- Native HAMi resource variable support
### **Xinference**
- HAMi vGPU integration in Helm Chart
- Passes `gpucores` and `gpumem-percentage` to Supervisor/Worker
---
## **Volcano Dynamic MIG**
- Dynamic MIG partitioning
- Best-Fit sizing
- Supports `vgpu-number`, `vgpu-memory`, `vgpu-cores`
---
## **Core Scheduler Optimizations**
### **Extended ResourceQuota**
- **Understands resource associations**
- **Handles dynamic percentage-based resource requests**
---
## **NVIDIA GPU — NVLink/PCIe Topology Scheduling**
**Workflow:**
1. **Topology Registration**
- Detect NVLink vs PCIe connections via NVML
- Build and annotate “connection matrix”
2. **Scheduling Decision**
- Filter by NVLink group availability
- Score for best fit or minimal topology damage
---
## **Enflame GCU — gcushare & Percentage Slicing**
**Features:**
- vGCU sharing
- Percentage-based compute/memory allocation
- Device UUID targeting
---
## **AWS Neuron — Core-Level Scheduling**
**Features:**
- Request at NeuronCore granularity
- Topology-aware contiguous allocation
- Simplified user resource declaration
---
## **Kunlun XPU — vXPU Partitioning & Topology Awareness**
**Features:**
- Hybrid full-card & vXPU deployment
- Auto memory alignment to hardware specs
- Topology-aware scheduling (leftwing/rightwing grouping)
- vXPU UUID whitelist/blacklist for fine-grained control
---
### **QoS Scheduling Logic**
- First Pod sets GPU QoS “identity” — subsequent tasks must match.
- Idle GPU → Any QoS allowed.
- In-use GPU → Only matching QoS tasks scheduled.