Ultimate Logging System Choice: ELK, EFK, or Loki? Pick Right to Save Hundreds of Thousands!
# Introduction
*"Our logging system burns through 100,000 RMB every month. Is that normal?"* — This was the sobering question posed by a CTO in a tech chat group last year.
As an architect with eight years in the logging domain, I’ve seen countless teams make costly mistakes:
- Some blindly adopt **ELK**, leading to runaway expenses.
- Others switch to **Loki**, only to find it lacks certain features.
- Some are driven to **despair** by the complex configuration of Fluentd.
As of 2025, log collection solutions are mature — yet picking the right stack for your **business scale and budget** still challenges many teams. Based on real project experience, this article compares the **three major solutions — ELK, EFK, and Loki** — in cost, performance, and ease of use.
If you’re wrestling with logging solution selection or cost optimization, this could save you hundreds of thousands per year.
---
## 1. Evolution & Core Requirements of Logging Systems
### Why Logging Matters
In microservices & cloud-native architectures, logging systems have evolved from “nice-to-have” tools to **critical infrastructure** for:
- **Failure troubleshooting**
- **Performance optimization**
- **Security auditing**
**Core purposes:**
- **Centralized management**
- **Real-time search**
- **Visual analysis**
- **Alerting**
- **Storage optimization**
- **Security compliance**
**Key stats (2025):**
- Robust logging cuts MTTR by **70%**.
- Log-driven observability triples operational efficiency.
- Poor choices inflate costs by 200%–500%.
---
### Generations of Logging Architectures
#### **1st Gen (2005–2012)** — *Decentralized*
- SSH + `tail -f` + `grep` / `awk`
- Inefficient; cannot handle microservices/containerization.
#### **2nd Gen (2012–2018)** — *Centralized: ELK Era*
**Architecture:**
Agent → Buffer → Storage → Search/Visualize
**Example stack:** ELK (Elasticsearch + Logstash + Kibana)
**Strength:** Powerful search
**Weakness:** High resource usage
#### **3rd Gen (2018–Now)** — *Cloud-Native*
- **Lightweight:** Loki
- **Managed SaaS:** AWS CloudWatch, Azure Monitor
- **Cost optimization:** Hot/cold separation + object storage
---
## 2. Overview of the Three Solutions
### ELK
- **Elasticsearch:** Search & storage
- **Logstash:** Collection & transform
- **Kibana:** Visualization
Architecture:
App → Logstash → Elasticsearch → Kibana
(Filebeat runs on each host)
Market share ~50%; standard for medium/large enterprises.
---
### EFK
- Same as ELK but **Fluentd** replaces Logstash.
Architecture:
App → Fluentd → Elasticsearch → Kibana
Popular in Kubernetes; ~20% share.
---
### Loki
- Label indexing only → **cheaper storage**
- Deep integration with Grafana & Prometheus
- Targets cloud-native/K8s
Architecture:
App → Promtail → Loki → Grafana
Market share ~15% (fast-growing among SMEs).
---
## 3. Requirements of Small–Medium Teams
- **Cost-sensitive**: Monthly log cost ≤ 20% of server cost
- **Limited Ops capability**
- **Moderate volume**: 10 GB–1 TB/day
- **Simple queries**: Mainly for troubleshooting
- **Fast onboarding**
**Selection mantra:** Enough functionality, low cost, easy to maintain.
---
## 4. In-depth Comparison
### Deployment Complexity
#### ELK — Heavyweight
- **Min. HA setup**: 3 ES nodes (+ Logstash + Kibana)
- **Resource**: 14 cores, 36 GB RAM, 300 GB storage for infra alone
- **Time**: 2–3 days
- **Complexity**: ⭐⭐⭐⭐⭐
minimal elasticsearch config
version: '3'
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
environment:
- node.name=es01
- version: '3'
- services:
- loki:
- image: grafana/loki:2.9.3
- Tip: The best log system lets you find issues quickly, stay within budget, and keep your team happy.
---
### Comparison Table
| Dimension | ELK | EFK | Loki |
|------------------------|-------------|-------------|-------------|
| Min nodes | 5 | 5 | 3 |
| Min resources | 14c / 36GB | 14c / 36GB | 3c / 6GB |
| Deployment time | 2–3 d | 2 d | <1 h |
| Config complexity | High | Medium | Low |
| Learning curve | Steep | Mod. | Gentle |
---
### Storage Cost
#### ELK/EFK
- Full-text index → large disk use
- Default replication → doubled storage
- 100 GB/day, 30-day SSD → **¥21,600/year**
- Hot/cold split → down to ¥9k/year
---
#### Loki
- Chunks + label index → 10:1 compression
- 100 GB/day, 30-day SSD → **¥3.6k/year**
- OSS retention 90 days → **¥1.6k/year**
---
### Query Performance
- **ELK/EFK**: Sub-second hot; 3–10 s cold
- **Loki**: Sub-second exact label match; slower regex
---
### Feature Completeness
| Feature | ELK/EFK | Loki |
|-------------------|-----------|-------------|
| Full-text search | ⭐⭐⭐⭐⭐ | ⭐⭐ |
| Complex analytics | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Visualization | Kibana | Grafana |
| Alerting | Built-in | via Prometheus|
| Multi-tenancy | Good | Excellent |
---
## 5. Practical Recommendations
- **Startups:** Loki — cost-effective, low maintenance
- **Mid-size:** Loki (troubleshooting) / EFK (analytics needs)
- **Large:** ELK — analytics & compliance
- **Hybrid:** Core business logs on ELK; infra logs on Loki
---
## 6. Migration Advice
**ELK → Loki:**
- Assess needs: mainly troubleshooting?
- Pilot dual-write to both stacks
- Train team in Grafana/LogQL
- Switch fully when stable
**Loki → ELK:**
- Growth now demands full-text + analytics
- Prepare for heavier Ops footprint
---
## 7. Cost Optimization
### ELK
- Hot/warm/cold separation via ILM
- Disable unnecessary indices
- Sample logs
### Loki
- Use OSS/S3 for chunks
- Set retention
- Reduce label cardinality
---
## 8. Final Insights
No “best” — only “most suitable.”
- **Loki** fits most SMEs → ~20–30% TCO of ELK
- **ELK** shines where logs are data assets
- Begin simple, evolve as needed
- Always pair selection with ROI analysis
---
---
#### EFK
- Collector lighter: Fluentd (Ruby) or Fluent Bit (C)
- Same Elasticsearch backend
- Resource: Memory usage 0.2–0.5 GB per Fluentd
- **Complexity**: ⭐⭐⭐⭐
---
#### Loki
- 1–3 nodes enough for small scale
- Resource: 3 cores, 6 GB RAM
- **Time**: 0.5–1 hour deploy
- **Complexity**: ⭐⭐