Oracle’s MySQL Cluster: Root Causes of Failure and Design Flaws
Oracle’s MySQL Cluster (NDB) — A Critical Overview
Many remember MySQL, but fewer recall the NDB project within it.
If we say MySQL Cluster, though, that might sound familiar.
MySQL once introduced a high-availability mode called MySQL Cluster.
Some consider it a failed product — here’s why.
---
Problem 1 — Chaotic Product Design
If Oracle already had RAC (Real Application Clusters), MySQL Cluster was essentially an attempt to replicate that model for MySQL.
Key questions Oracle should have answered before launching this:
- Who are MySQL’s customers?
- Do they share the same needs as Oracle RAC users?
- Does market research support this crossover?
Architecture Comparison
- Oracle RAC: Shared-everything cluster, multiple machines connected via high-speed interconnect, all sharing the same disk.
- MySQL NDB: Built for telecom billing systems, optimized for millisecond response, high availability, and redundancy — handling short, structured transactions.
So why build NDB for MySQL, given Oracle already had RAC and NDB’s origins were telecom-specific rather than general-purpose databases?
---
NDB’s Core Design and Limitations
Architecture: Shared-nothing, sharded data across multiple nodes.
Strengths:
- Extreme speed on primary key lookups.
Weaknesses:
- Performance collapse on queries involving `JOIN` or `GROUP BY`.
- Coordination overhead across nodes causes major network communication delays.
- Pushdown optimizations are limited (e.g., by JOIN column types).
> Expert Observation: Complex reporting queries on MySQL Cluster can perform worse than a normal InnoDB database.

---
Summary of Core Issues
Not supporting complex queries isn’t unique to MySQL Cluster — it’s a physical sharding limitation.

---
Further Defense (and Real Drawbacks)
Hardware Budget
- Requires many servers with large RAM.
- Often needs dedicated subnets and high-end NICs.
- Software is free, but infrastructure and operations are costly.
Configuration & Tuning
- NDB is uncommon — tuning tips for mainstream MySQL (InnoDB) don’t apply.
- Skilled NDB DBAs are rare; training an existing DBA takes time and money.
Schema Design
- Sharding requires carefully designed schemas.
- Simple queries per shard → good performance.
- Cross-shard range queries → poor performance.

---
Key Drawback Summary
- Memory-intensive and needs extra hosts.
- Not fully compatible with standard MySQL operations.
- Poor performance for complex SQL (especially `JOIN`).
- Migrating existing apps requires redesign.
- Optimized for primary key queries only.
---
Documentation Gaps
Official MySQL Cluster documentation glosses over:
- Query complexity issues
- Application redesign requirements
Historical Note:
NDB was not created by Oracle — it was acquired from Ericsson, originally built for telecom workloads.


---
Architecture Facts
- Shared-nothing, in-memory synchronous distributed design.
- All nodes store entire datasets in RAM (disk mode → large performance loss).
- Updates require two-phase commit across nodes → scaling increases write latency.
- JOIN unsupported for general workloads — KV model recommended.
- Lacks distributed query optimizer.
- Missing many MySQL features: full-text indexing, spatial indexing, constraints, foreign keys, triggers, stored procedures.
---
Management Complexity:
Involves NDB_MGMD, NDBD, MYSQLD — highly sensitive configuration and startup sequence.
---
> Forum comment: “The biggest fear is something going wrong — because fixing it can take half a day.”
---
NDB Test Scenarios
| Test Scenario | Expected Result |
|---|---|
| Simple primary key KV writes | Good performance, near-linear scalability |
| Cross-partition join | Severe slowdown, latency spikes |
| Increase node count to 8–10 | Unstable write latency, higher failure rates |
| Scale SQL nodes horizontally | No complex query speed gains |
---
Simulation: Network Jitter + Node Restart
When network fluctuations occur alongside node restarts, recovery can be slow, sometimes leading to:
- Transaction suspension
- Data inconsistency
- Leader re-election delays
Recommendations:
- Test network fault injection in QA.
- Tune consensus subsystem timeouts.
- Delay restarts after jitter for replication completion.
- Aggressively monitor transaction queues during instability.
- Use multi-region replication to reduce impact spread.
---
Final Takeaway
MySQL Cluster (NDB):
- Built for telecom-specific workloads → unsuited for general DB use.
- Poor fit for complex query workloads.
- Scaling nodes doesn’t guarantee performance gains.
- Operational & management costs are high.
---
If sharing these kinds of technical architecture analyses widely, consider tools like
AiToEarn官网 — an open-source, AI-powered content platform offering:
- Multi-platform publishing (Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, YouTube, Pinterest, X/Twitter)
- Integrated content generation, analytics, and AI model ranking
- Efficient ways to monetize technical insights across audiences

---
Would you like me to turn this into a concise executive summary table for CTOs and architects? It would make the key points clearer for quick decision-making.