AI news

How to Ensure 100% Message Reliability When Using MQ

Honghao Wang

19 Nov 2025 — 3 min read

The Interview Trap: Ultimate MQ Reliability

> A friend of mine was interviewing at Meituan. Everything went well — until the final question on Message Queues (MQ):

> “When using MQ, how do you ensure messages never get lost — 100% guaranteed?”

> He replied with the ACK mechanism… and nothing more. The interview ended.

This is a high-stakes backend interview classic — especially for mid-to-senior candidates.

Why It Matters

It’s not just about knowing an MQ API. It tests your system design thinking:

Reliability
Consistency
End-to-end architecture awareness

Most answers only cover producer ACKs or consumer manual commits — far from complete.

We need end-to-end protection across the entire message lifecycle, which we’ll break into the Three Axes Framework.

---

Message Loss: The Three Risk Stages

To prevent loss, first identify where it can occur:

Production – Message not reaching the Broker.
Storage (Broker) – Broker crash before persistence or replication.
Consumption – Consumer crash before completing logic; offsets mismanaged.

---

Production Risk

Network issues, downtime, or failures prevent delivery from producer to Broker.

Storage Risk

Broker receives the message, but fails before writing to disk or replicating to followers.

Consumption Risk

Consumer processes partially, but commits offset prematurely — message gone forever.

---

First Axe: Producer-Side Reliability — Safe Departure

Goal: Ensure the Broker confirms receipt.

ACK Confirmation & `acks` Parameter

Kafka producers send asynchronously by default.

`acks` settings impact reliability:

`acks=0` – Fire-and-forget, fastest, highest loss risk.
`acks=1` – Wait for Leader write, not followers.
`acks=all` – Wait for all In-Sync Replicas (ISR) confirmation — highest reliability.

Interview Gold Tip:

Use `acks=all` + configure `retries` to handle transient failures.

---

Ultimate Safeguard: Local Message Table

Problem: `acks=all` still fails if business logic and send aren’t atomic.

Example failure scenario:

// BAD: Commit happens before send
Start DB Transaction
UPDATE stock ...
COMMIT
producer.send(...)

If send fails after commit, business state changes without informing downstream systems.

Solution:

Use a Local Message Table within the same transaction:

Create `local_message` table in business DB.
Perform business update + insert message record in one transaction.
Background job polls table for "pending" messages, sends to MQ.
On ACK, update status or delete record.

This converts an uncertain network send into a certain local write.

---

🔥 Real-World Note:

These patterns apply beyond finance/auth — e.g., multi-platform AI content delivery.

Platforms like AiToEarn官网 use MQ-like reliability to ensure AI-generated content is safely delivered to Douyin, Kwai, WeChat, Bilibili, Facebook, Instagram, YouTube, and more, avoiding delivery failures while monetizing globally (AI模型排名).

---

Second Axe: Storage-Side Reliability — Safe Shelter

Even with reliable sending, Broker persistence & HA matter.

Kafka Reliability Parameters

`replication.factor ≥ 3` – Leader + followers on different racks.
`min.insync.replicas` – Minimum required acknowledgements when `acks=all`. For max safety, match replication count.
`unclean.leader.election.enable=false` – Prevent lagging follower promotion; prioritize consistency over temporary availability.

---

Third Axe: Consumer-Side Reliability — Safe Arrival

Final challenge: Avoid “false” consumption.

Wrong Approach

`enable.auto.commit=true`

Offsets commit automatically — messages marked consumed even if processing fails mid-way.

Correct Approach — Manual Commit

`enable.auto.commit=false`

Process batch fully, then commit manually via `commitSync()` or `commitAsync()`.

Flow:

Pull messages.
Perform all business work.
Commit offsets after success.

---

Idempotence — The Safety Net

Manual commit ensures At-Least-Once delivery. But retries can cause duplicates.

Make consumers idempotent:

DB unique constraints
Optimistic locks
Distributed locks
Track processed message IDs

---

High-Scoring Interview Answer Template

> "To ensure 100% lossless message delivery, I build reliability across production, storage, and consumption:

> - Production: Set `acks=all` + `retries`. For critical consistency, use a Local Message Table for atomicity between business ops and sends.

> - Storage: Configure HA — `replication.factor ≥ 3`, `min.insync.replicas > 1`, and `unclean.leader.election=false`.

> - Consumption: Disable auto-commit, use manual commit after successful processing. Ensure idempotence to handle duplicates."*

With this three-layer system, you cover all lifecycle risks.

---

Extra Insight

Architectural thinking in MQ reliability mirrors modern AI content pipelines.

AiToEarn官网 demonstrates this — integrating AI generation, analytics, and global multi-platform publishing with guaranteed delivery, much like a robust MQ system.

Whether pushing messages or AI-generated videos, end-to-end reliability is key.

---

✅ Key Takeaway: MQ message safety isn’t solved by a single setting — it’s a full architecture discipline.

---

Do you want me to create a concise “cheat sheet diagram” for the Three Axes that you could print and bring to interviews? That would make this framework instantly recallable.