Improving Date Formatting Performance in Node.js

·

Some months ago, I was investigating why a particularly large response in a Node.js application was taking too much time to produce. The application was an aggregator for movie showtimes that allowed users to see relevant showtimes based on selected filters. In some cases, many results could be returned which made the response unusually slow. This is how I went about investigating the issue and what came out of it.

Profiling Node.js

Node.js has had a --prof option for quite some time, and it allows you to generate a text file that shows which functions took the most time while running your application. However, it doesn’t always work very well as much of the CPU time spent would be marked as “unaccounted”. More recently, Node.js provides a new option, --cpu-prof. When run with this flag, node creates a .cpuprofile file that could then be loaded into Chrome DevTools and you could visually inspect where time is being spent in your code. Using the DevTools, I proceeded to profile the application, specifically looking at the Bottom-Up tab, sorted by Self Time. This tells you which functions are doing too much work in and of themselves (as opposed to the total time, which includes time spent calling other functions).

What’s slow

It turned out there were several libraries that the application depended on that had performance issues, but the most prominent bottleneck was date and time formatting.

I used the Luxon library for date and time handling in this project, particularly for time zone support. In order for Luxon to get the offset of a particular time zone for a given datetime, it resorts to the Intl.DateTimeFormat API. This API provides the ability to format any datetime value in a locale-specific manner with many options to customize the output. It also allows you to format a datetime in a given time zone.

Since there is no native way in JavaScript to get the offset of a time zone (yet), libraries resort to this feature to essentially format the datetime in a given time zone, then parse the components of the formatted version and calculate the timestamp from that. By comparing this to the known UTC timestamp, the offset can be found. Altogether, a rather expensive operation.

From Node.js to ICU

Since the format method of Intl.DateTimeFormat is implemented using native C++ code, it won’t show up in the Chrome profiler, only its caller. As I’m using macOS, I usually go for the Instruments profiler that comes with Xcode to profile native code.

Using Instruments, and specifically the Time Profiler, showed that Intl.DateTimeFormat was implemented in the V8 engine used by Node.js, and it did little more than call the ICU library — the International Components for Unicode. This is the library that implements the Unicode standard, and is used by most operating systems and browsers. While generally well-optimized, it has a vast surface area so improvements can always be made.

Making ICU faster

The first step was to write a simple benchmark in C++ that did nothing more than format a datetime several thousand times in a loop. I ran that through Instruments and looked at the results. Here too, the bottom-up feature (Invert Call Tree in Instruments) and self time (Self Weight) are most useful. After some trial and error, it turned out there were problems in essentially four areas:

  • Memory allocations
  • Floating-point operations
  • Unoptimized hot paths
  • Missing fast paths

Memory allocations

This was by far the biggest culprit in the slow formatting performance. ICU would heap allocate (malloc) an object for every number component formatted in a date (eg. day, hour, minute) followed by a free soon after. A datetime might have six components — year, month, day, hour, minute and second. That’s six memory allocations, times a few hundred formatting operations and it adds up quickly. One additional allocation was also used for a calendar object that had to be cloned per formatting operation. Eliminating all those allocations and using the stack provided a significant performance boost right off the bat.

Floating-point operations

This was a surprise to me, but while profiling I saw fmod show up a lot. This function computes the floating-point remainder, similar to % for integers. It’s not surprising for calender calculations to utilize modular arithmetic heavily, but surely there are no floats in dates? Indeed there aren’t, and converting fmod to a regular % and ensuring integers are used throughout provided another performance boost.

Unoptimized hot paths

When formatting a component in a datetime, such as the hour, the formatting code needs to get the relevant information for that particular pattern character (eg. H in an HH pattern). This was done by looping through an array of those pattern characters until it found the matching one, and that same index would be used for another array that contained the information. This was changed to a simple lookup table, so it would take O(1) time to convert a pattern character to an index rather than O(N).

Missing fast paths

At the core of the ICU library, is the UnicodeString object which is used for anything string related. In a given formatting operation, a string is constantly appended to, and for small strings UnicodeString uses the stack while for large strings it allocates on the heap. When appending, it uses memmove. For date formatting, short strings are exclusively used, so it’s prudent to add a fast path for such strings that would fit into the existing stack buffer. In that case, avoiding the call to memmove and doing a simple unrolled loop copy for small strings proved to be noticeably faster.

Final result

With all these changes, formatting is now up to twice as fast, and sometimes more. But is it as fast as it can get? I decided to compare with the reliable strftime in the C standard library. On my 2016 MBP, running strftime with a simple %I:%M %p format (hour:minute am/pm), it can do a million formatting operations in around 630 ms. Doing the same with ICU 74.2 (before the optimizations, using the equivalent hh:mm a format), it took almost three times as long at 1700 ms. And now, with the recently released ICU 75.1 which includes all the mentioned optimizations, it takes around 800 ms, making it more than twice as fast as before.

Back to Node.js

With the recently released versions 20.13 and 22.1 these changes have made their way back to Node.js. To try them out, I ran a simple benchmark comparing 20.13 (ICU 75) and 18.20 (ICU 74). The official releases from the Node.js website (such as the ones obtained via nvm) bundle the ICU library in the same binary. If you get Node.js from your package manager, it might use the system ICU library which might not be the latest version. To check which version Node.js uses, run node -p process.versions.icu. Finally, let’s check if Intl.DateTimeFormat is faster as expected:

const fmt = new Intl.DateTimeFormat("en-US", {
  hour: "2-digit",
  minute: "2-digit",
  hour12: true,
});
const date = new Date();
fmt.format(date); // Warmup - the first run is much slower
console.time("format");
for (let i = 0; i < 1_000_000; i++) fmt.format(date);
console.timeEnd("format");

Running this script, I get 2100 ms for Node.js 18 and 1300 ms for Node.js 20 — a 1.6x improvement.

For something closer to production use, let’s try a few thousand calls to Date.toLocaleString, which includes both date and time fields, and ensure that it’s thoroughly warmed up, simulating a long-running application:

const d = new Date();
for (let i = 0; i < 100_000; i++) d.toLocaleString(); // Warmup
console.time("format");
for (let i = 0; i < 10_000; i++) d.toLocaleString();
console.timeEnd("format");

In this case, we get a 2x improvement, with Node.js 18 taking 36 ms and Node.js 20 taking just 18 ms, making it twice as fast.