AI news

Refactoring a Class Saves 2.9G Memory in the JVM?

Honghao Wang

28 Nov 2025 — 4 min read

Data Structure Refactoring That Saved Nearly 3 GB JVM Heap Memory in Production

This article explains how we refactored a core class’s data structure from

`HashMap>` → FastUtil `Long2ObjectOpenHashMap`.

By leveraging business data characteristics—tags as small integers, sparsely distributed—we replaced `HashSet` with sorted `int[]` arrays plus binary search lookups.

Key Results

Heap drop: ~3.2 GB → 211 MB (↓ 94% memory usage)
Validated in production with dual-write checks before rollout
No performance trade-off in lookups; in many cases, lookups improved

---

Optimization in Numbers

Refactoring one class dropped JVM heap by ~3 GB. This was measured in production, not a test lab:

Verification steps:

Dual-write the old and new data structure
Compare memory metrics in real-time
Switch fully to new model after validation

Takeaway: General-purpose containers (`HashMap`, `HashSet`) and boxing can be surprisingly costly at scale.

---

1. Background

In our business pipeline, an interface filters product tags per city with complex logic:

Multi-condition `AND`/`OR` nesting
Multi-level expression trees
Runtime-only computations for some rules

Because these filters couldn’t be expressed efficiently in our search backend, we cached full product-tag data in memory keyed by city ID for real-time filtering.

⚠️ Without this cache, each request rebuilt the structure → massive temporary memory spikes.

---

2. Legacy Structure

public class RecallPlatformTagsResp extends BasePageResponse {
    private static final long serialVersionUID = 8030307250681300454L;
    private Map> resultMap;
}

Example:

[
    {1:["2246","2288","3388","1022"]},
    {2:["12246","12288"]}
]

Problem: At scale, nested `HashMap` + `HashSet` → huge overhead from:

Object headers
Boxing/unboxing
Hash table arrays
Node pointers

---

3. Tag Distribution Analysis

From 1 million sample items:

80% ≤ 16 tags
90% ≤ 32 tags
99% ≤ 64 tags
Max tags < 128

Implications:

Tags are bounded and small → arrays suffice
Data is immutable after init → binary search works
Tags are integers → `int[]` avoids `String` + object overhead

---

Visual Memory Impact:

---

4. New Structure Choices

Alternatives tried:

FastUtil primitive maps (`Long2ObjectOpenHashMap`)
Sorted `int[]` + binary search
RoaringBitmap
Custom compact object pool

Test metrics: memory footprint, lookup speed, init cost, code complexity

---

Best Combo:

Key: FastUtil `Long2ObjectOpenHashMap` → no boxing, tighter layout
Value: sorted `int[]` → tiny footprint, fast binary search

Memory Effect:

Replace value (`HashSet` → `int[]`) in JDK `HashMap` → 1.3 GB → 64 MB
Then replace JDK map with FastUtil → 25 MB

---

Why `int[]` over `HashSet`:

|------------------------|------------|--------|-------|

With k ≤ 128:

≤ 7 comparisons worst-case
90% of lookups ≤ 5 comparisons
Negligible latency impact; massive memory savings

---

5. Test Harness

JProfiler memory dumps compared:

`Map>`
`HashMap`
FastUtil `Long2ObjectOpenHashMap`
`BitSet`
RoaringBitmap

import it.unimi.dsi.fastutil.longs.Long2ObjectOpenHashMap;
...

(full Java test code above)

---

Result: Long2ObjectOpenHashMap is optimal

public class RecallPlatformTagsResp extends BasePageResponse {
    ...
    private Map> resultMap; // legacy dual-write
    private Long2ObjectOpenHashMap itemTags = new Long2ObjectOpenHashMap<>(1_600_000);
}

---

6. Why FastUtil `Long2ObjectOpenHashMap`

Advantages

Primitive keys — no boxing (`long` direct)
Open addressing — faster, cache-friendly
Compact arrays — no linked lists or tree nodes

Internals:

long[] key;    // primitive keys
Object[] value;
mask = capacity - 1; // power of two

Hash: `(int)(key ^ (key >>> 32)) & mask`

---

Collision Handling

Linear probing: place colliding keys in next free slot

get(): probe until match or empty slot

remove(): use Backward Shift algorithm — slide items back to fill gaps, avoid breaking probe chains

---

Performance Notes

Load factor ≤ 0.7 recommended
Avoid sequential keys causing clustering
Initialize with estimated capacity

  new Long2ObjectOpenHashMap<>(16384);

---

Suitable Use Cases

✅ High-frequency `long → Object` lookups, GC-sensitive systems

❌ Concurrent writes without external sync; highly skewed keys

---

Final Summary

Before:

HashMap>

After:

FastUtil Long2ObjectOpenHashMap

Outcome:

Memory ↓ 94%
Speed maintained
GC pressure reduced

---

Key Takeaways:

Generic containers can hide massive overhead at scale
Let data characteristics drive structure choice
Simple primitives + targeted algorithms beat general-purpose designs for bounded, read-mostly datasets

Refactoring a Class Saves 2.9G Memory in the JVM?

Honghao Wang

Data Structure Refactoring That Saved Nearly 3 GB JVM Heap Memory in Production

Key Results

Optimization in Numbers

1. Background

2. Legacy Structure

3. Tag Distribution Analysis

4. New Structure Choices

5. Test Harness

6. Why FastUtil `Long2ObjectOpenHashMap`

Advantages

Internals:

Collision Handling

Performance Notes

Suitable Use Cases

Final Summary

Read more

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

TopGear Picks 18 Cars of the Year, Only One from China

Data Structure Refactoring That Saved Nearly 3 GB JVM Heap Memory in Production

Key Results

Optimization in Numbers

1. Background

2. Legacy Structure

3. Tag Distribution Analysis

4. New Structure Choices

5. Test Harness

6. Why FastUtil `Long2ObjectOpenHashMap`

Advantages

Internals:

Collision Handling

Performance Notes

Suitable Use Cases

Final Summary

Read more

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

TopGear Picks 18 Cars of the Year, Only One from China

Data Structure Refactoring That Saved Nearly 3 GB JVM Heap Memory in Production