Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Show HN: I optimized a Java log scanner from 872ms to 78ms (Zero GC) (github.com)

3 points by yosinago 3 days ago | 1 comment

yosinago 3 days ago [-]

Hey everyone,

I recently wanted to see how far I could push the JVM on a classic I/O problem: parsing a 1-million line log file to extract error counts per hour.

I started with the most naive, readable Spring-tutorial style approach (Files.readAllLines + Stream API). The baseline: 872ms and 19 GC pauses.

I decided to rewrite it step-by-step to understand the exact bottlenecks. In my final iteration (V04), I brought the execution time down to 78ms with absolutely Zero GC allocations, running on a single thread (Intel Core i5).

How I did it:

Off-Heap Memory: I dropped String allocations entirely and used the FFM API (MemorySegment) mapped directly to the file via FileChannel.

SWAR (SIMD Within A Register): Instead of reading byte-by-byte, I used ValueLayout.JAVA_LONG_UNALIGNED to load 8 raw bytes per CPU cycle. Since log lines are variable length, the memory isn't aligned, but modern x86 handles this beautifully.

Bitwise Operations: I used bitmasks to locate the \n character across all 8 byte lanes simultaneously, and Long.numberOfTrailingZeros to pinpoint the exact offset.

BCE (Bounds Check Elimination): By extracting the hour integer and using a bitmask index (hour & 0x1F), the JIT compiler could prove the value is always within [0,31], completely eliminating array bounds checks.

It was an amazing journey into mechanical sympathy and understanding what Java does under the hood.

If you're interested in performance engineering, I documented the entire journey (from V01 to V04, isolating each bottleneck and explaining the JVM/Hardware impact) in the repository

I'd love to hear your thoughts or if you see any further micro-optimizations I might have missed!

Rendered at 10:35:50 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.