For the longest time, I've always thought that JavaScript generators perform better than chained array methods in terms of execution time.
Surely, I thought that the superior memory efficiency of generators (due to lazy computation) implied better overall execution time as well.
Surely, in terms of time complexity, the single-pass
O(n)
nature of generators were superior over the
O(kn)
of
k
chained array methods.
All of the heuristics that I had been taught by computer science point to generators being the undisputed champion. After many years of personal presumption and dogma, it's finally time to put that to the test—with statistical rigor of course! So strap on your seatbelts as we finally answer the question:
Are generators actually faster than chained array methods?
The Setup
When we chain array methods, we typically perform a filter first then the map. The main idea is that it is more efficient to filter out the bad data first so that we don't have to map as many values later on. This will be our primary experiment: a basic filter + map operation.
Let's start with generating 32 KiB of random data.
// Generate 32 KiB of "cyptographically strong random values".constbytes=crypto.getRandomValues(newUint8Array(1<<15));// Then save the buffer into a file.// Deno is just an implementation detail.Deno.writeFileSync('random.bin',bytes);
Below is a histogram of the generated bytes. Indeed, we see a fairly uniform distribution of byte values in the random.bin file. For the sake of reproducibility and independence, we will now use this binary dump for all experiments.
The goal is to simply prune out all of the bytes greater than 127 (roughly half of random.bin) and then normalize the data so that the remaining bytes are between 0.0 and 1.0 (inclusive). Let's now discuss the four cases that we will be testing.
Case 1: raw-for-loop
By presumption, we consider the raw for loop construction as the baseline. Here, we iterate over all values in nums and push the mapped value into the output array.
A slight variation of the raw-for-loop case is the for-of-loop case, which instead uses the for...of loop to iterate over nums. Everything else remains the same.
Before we run the experiments, let's get the hardware specifications out of the way. Note that future work should reproduce this experiment on native Linux systems.
OS: Windows 22H2 (Build 19045.4842)
CPU: Intel® Core™ i7-10750H CPU @ 2.60GHz (12 CPUs)
RAM: 8192 MB
Now we implement a CLI for running each experiment case. Here is a high-level overview of the harness logic:
Select one of the four implementations: either raw-for-loop, for-of-loop, array-method, or generator.
Reads the 32-KiBrandom.bin file for the cached random bytes.
Perform 2048 warm-up iterations to allow the just-in-time compiler to optimize the code.
Finally, run the actual experiment with high-resolution sub-millisecond timing for 16384 iterations.
When the standard output is piped into a file, the result is 16384 lines of timing data.
I'll spare you the gory details of the harness implementation. For those who are interested in the argument parser, the file loader, the implementation selector, and the experiment runner, you may visit the GitHub repository. Please do leave a star while you're there!
The repository for all the scripts, notebooks, results, and analyses related to my experiments on JavaScript generators and chained array methods.
JavaScript Generators vs. Array Methods
The repository for all the scripts, notebooks, results, and analyses related to my experiments on JavaScript generators and chained array methods. See the full article "I was wrong about array methods and generators..." for more details on the methodology, results, interpretations, and conclusions.
Generating Random Bytes
The first step is to generate 32 KiB of random data. There are many ways to do this, but I opted to generate them via the Crypto#getRandomValues function in JavaScript. A Deno script is already available at random/random.js.
# This will overwrite the already existing `random.bin` file!
deno run random/random.js
Running the Experiments
# Node.jsfunctionrun () { node --experimental-default-type=module benchmark/bench.node.js $1> notebook/node/$1.csv; }
run raw-for-loop
run for-of-loop
run array-method
run generator
Next, on which runtimes shall we run the experiments? I've selected the three most prominent JavaScript runtimes as of writing.
Node.js (v22.8.0) is written in C++ and powered by V8.
Deno (1.46.3) is written in Rust and powered by V8.
Bun (1.1.26) is written in Zig and powered by JavaScriptCore.
The Results
With 2048 warm-up iterations and 16384 actual samples, I believe it is fair to assume that sample sizes in the experiment are sufficiently large enough to omit the analyses of p-values. After all, with so many samples available, the observed data as a whole is hardly a fluke at that point.
Instead, the following results and discussion will focus on the pairwise effect size of the sample distributions. Given two sample distributions (e.g., raw-for-loop versus array-method), the effect size quantifies how "far apart" the mean of raw-for-loop is when compared to that of array-method.
For example, a large negative effect size implies that raw-for-loop has a significantly lower execution time (good!) than array-method. In practice, we are effectively quantifying the "impact" of the array-method on our baseline (i.e., raw-for-loop).
To keep things simple, we will use Cohen's
d
as our metric for effect size. As a general rule of thumb, Cohen suggests (but warns against the dogmatization of) the following interpretations for
d
:
We say that the effect size is small when
∣d∣<0.2
.
But if
∣d∣<0.5
, then we consider that to be a medium effect.
But if
∣d∣<0.8
, then we have a large effect.
Otherwise,
∣d∣≥0.8
is a huge effect.
Now that we're all on the same page, we can examine the results of the experiment.
Node.js Results
Right off the bat, the Node.js results demolish my long-held presumptions. Not only am I wrong, but it turns out that generators are the slowest of all the cases!
This histogram still sends shivers down my spine as I recall the countless number of times that I've written generators in my code under the presumption of optimization.
But how bad is it really? Let's take a look at the numbers.
Experiment
Mean
Mode
Standard Deviation
raw-for-loop
0.3309
0.2828
0.1975
for-of-loop
0.3368
0.2873
0.2031
array-method
0.4360
0.3557
0.2404
generator
1.1024
1.0010
0.2134
By eyeball statistics, we can see right away that the differences between raw-for-loop (baseline) and for-of-loop are insignificant. In fact, we can exactly quantify that with Cohen's
d
. The table below summarizes how the raw-for-loop compares against the other experiment cases.
d
raw-for-loop
for-of-loop
-0.0296
array-method
-0.4778
generator
-3.7518
Let's take a look at some key observations here.
Comparing raw-for-loop and for-of-loop, we obtain
d≈−0.0296
. That means raw-for-loop is only marginally faster than for-of-loop up to a very small effect—so small in fact that it is an injustice to say that there is one!
We get into the medium territory between raw-for-loop and array-method with
d≈−0.4778
. That is to say, the raw-for-loop is faster than the for-of-loop up to a non-trivial medium effect. These results are consistent with that by various authors and blog posts over the years.
Perhaps the most shocking result is the extreme effect of generator on raw-for-loop having
d≈−3.7518
! Given these plots, it is fairly conclusive that generators significantly penalize execution time.
Going back to the original question, though: how badly do generators compare to chained array methods? It turns out that the effect size between array-method and generator is
d≈−2.9315
! Once again, we have entered extreme territory.
Deno Results
Hmm... that plot looks eerily familiar. As a matter of fact, we have already seen this plot in the previous section on Node.js!
Because Node.js and Deno both share V8 as their underlying JavaScript engine, the performance differences between the two are not statistically significant. From this point forward, we can thus refer to Node.js and Deno interchangeably as if they fall under the one umbrella of V8.
For the sake of completeness, here are the same tables shown in the Node.js section, but in the context of Deno instead.
Experiment
Mean
Mode
Standard Deviation
raw-for-loop
0.3267
0.2828
0.1857
for-of-loop
0.3440
0.2963
0.1987
array-method
0.4221
0.3462
0.3949
generator
1.0946
1.0180
0.1843
d
raw-for-loop
for-of-loop
-0.0898
array-method
-0.3092
generator
-4.1505
Note that the effect size between array-method and generator is
d≈−2.1824
: yet another extreme effect on display.
Node.js and Deno: One and the Same
Let's go back to the observation earlier that Node.js and Deno exhibit the same performance characteristics. The following figure superimposes the Deno histogram over the Node.js histogram. The overlap is quite impressive to say the least.
Now let's quantify exactly how much they overlap by considering the effect sizes of corresponding experiment cases.
Experiment
d
raw-for-loop
0.0219
for-of-loop
-0.0356
array-method
0.0426
generator
0.0388
In all cases, there is little to no effect between Node.js and Deno. If we must draw a conclusion, though, we can see that Node.js is marginally slower than Deno except in the for-of-loop case. Nevertheless, this is already well within the realm of noise, so let's not look too deep into it.
Bun Results
As the JavaScriptCore engine enters the scene, the Bun histogram takes on a completely different shape. I can already hear the Zig fans rejoicing as their results practically blow Node.js and Deno out of the water.
A cursory inspection of the histograms already shows more stable performance characteristics in JavaScriptCore than in V8. The curves are closer together, but still exhibit the same patterns as in Node.js and Deno—albeit at a smaller magnitude. The table below summarizes the relevant statistics from the experiments.
Experiment
Mean
Mode
Standard Deviation
raw-for-loop
0.2786
0.2626
0.2749
for-of-loop
0.2789
0.2644
0.2718
array-method
0.3608
0.2707
0.3274
generator
0.4919
0.4712
0.2651
Rather than rehashing the discussion as in the Node.js and Deno sections, let's instead examine the Bun data based on its similarities and differences from the previous two runtimes.
Like Node.js and Deno, the performance degradation with respect to the raw-for-loop baseline is ordered as for-of-loop, array-method, and generator. This phenomenon where generator performs worst—much to my dismay and chagrin—is consistent with all engines.
Unlike that of Node.js and Deno, the Bun distributions are closer together—indicating more stable performance characteristics.
Bun (JavaScriptCore) optimizes the underlying state machines of the generator implementation better than Node.js and Deno (V8). By how much, you may ask? Between Node.js and Bun, the effect size is
d≈2.5370
. Meanwhile, between Deno and Bun, the effect size is
d≈2.6401
. That is to say, both Node.js and Deno perform extremely worse than Bun.
The next table summarizes the effect sizes of each experiment on the raw-for-loop baseline.
d
raw-for-loop
for-of-loop
-0.0012
array-method
-0.2722
generator
-0.7898
Like Node.js and Deno, the for-of-loop is practically equivalent in performance characteristics to the raw-for-loop baseline.
Like Node.js and Deno, a similarly small effect size has been observed between raw-for-loop and array-method. Indeed, the performance overhead is small but certainly non-zero.
Unlike Node.js and Deno, the overhead of the generator case only tiptoes between the medium and large effect sizes, whereas the former exhibited extreme effect sizes. This result attests to the previously discussed stability of Bun's performance characteristics.
Now how do array methods and generators compare in Bun? Between array-method and generator, the effect size is medium with
d≈−0.4399
.
Interpretations and Conclusions
The most glaring conclusion thus far is that I was comically wrong about generators and chained array methods. In all experiments across all runtimes, the generator case consistently performed the worst. To add salt to the wound, I was not even wrong by a small margin, but by an extreme margin! Truly a hard lesson learned: measure, measure, measure.
The Importance of Spatial and Temporal Locality
I suspect that these results are due to the heavily optimized C++ implementations that underpin filter and map. Specifically, since arrays are contiguous data structures, low-level caches and memory prefetchers (in hardware and software) exploit the spatial and temporal locality of the array operations. In other words, a tight loop over 32768 integers becomes light work for hardware that can fetch and process 32 KiB in bulk without invalidating its internal caches.
Consider the alternative: generators. Generators are implicit state machines that tick forward after every yield. There is no notion of contiguity here because the values that will be processed are yet to be yielded by the generator. That alone fundamentally hampers the spatial and temporal locality of array operations. The state machine must necessarily tick one iteration at a time. This is opposed to the C++ code of filter and map that can zoom through the ready data in one fell swoop—albeit in two iterations.
In practice, this means that the
O(2n)
time complexity of filter + map does not tell the whole story. Although generators are single-pass
O(n)
constructs, sacrificing spatial and temporal locality incurs an extreme overhead that more than makes up for the two-pass bulk operation of filter + map.
As we chain more array methods (which is rarely necessary), we anticipate array-method eventually surpassing generator in execution times.1 Until then, we have shown that the overhead of generators due to (implicit state machines) proves to be too costly.
Some part of me is quite relieved that this is the case. I must admit that filter + map chains read more elegantly than generators (even with helpers from itertools). At least I can now rest assured that generator gymnastics need not apply.
V8 versus JavaScriptCore
An unexpected storyline that unraveled as I performed the experiments was the lopsided battle between V8 and JavaScriptCore. I have heard about Bun's supposedly superior performance versus Node.js and Deno, but I have only experienced it in the making of this article. The effect sizes are less impressive in the raw-for-loop, for-of-loop, and array-method cases. Where Bun really shined was with the generator case and the overall stability of its performance characteristics.
Frankly, I am quite impressed by Bun and especially JavaScriptCore. To beat the formidable V8 engine by extreme margins is no small feat. For that, I will continue to keep a close eye on Bun. I look forward to seeing how far we can push the boundaries of JavaScript.
So... when to use generators then?
Generators still have their place where lazy computation is a necessity due to memory constraints. One notable example is file streaming, where it's impractical to load an entire file or response into memory (as is typically the case in cloud environments).
More generally speaking, generators shine in environments that are bottlenecked by idle times due to I/O (e.g., file system, networks, user inputs, streams, etc.). Here, CPU performance is less relevant because the CPU would be idle waiting on I/O to respond anyway. The lazy computation keeps memory usage low while data streaming ensures that clients receive data as soon as possible.
However, there is another compelling reason to use generators: composability. Because generators are lazily executed, operations on the resulting data stream are easily composable as items are processed one at a time. The popular itertools package—which features a plethora of iteration utilities—is a testament to this fact.
In fact, the developer experience around iteration utilities is so great that there is an active Stage 3 ECMAScript proposal right now (as of writing) that aims to officially add lazy iterator helpers to the language. When this is more widely supported, expect a follow-up article to this one that investigates the would-be C++-backed helpers.
Future Work
As thorough as I have been in this research, there is still some more work to be done. Here are a few bullets that were previously mentioned in the article.
Reproduce these experiments on a native Linux machine.
Investigate whether a sufficiently long chain of array methods eventually surpass the execution time of the equivalent generator.
Compare the performance characteristics of the upcoming iterator helpers.
The repository for all the scripts, notebooks, results, and analyses is available below.
The repository for all the scripts, notebooks, results, and analyses related to my experiments on JavaScript generators and chained array methods.
JavaScript Generators vs. Array Methods
The repository for all the scripts, notebooks, results, and analyses related to my experiments on JavaScript generators and chained array methods. See the full article "I was wrong about array methods and generators..." for more details on the methodology, results, interpretations, and conclusions.
Generating Random Bytes
The first step is to generate 32 KiB of random data. There are many ways to do this, but I opted to generate them via the Crypto#getRandomValues function in JavaScript. A Deno script is already available at random/random.js.
# This will overwrite the already existing `random.bin` file!
deno run random/random.js
Running the Experiments
# Node.jsfunctionrun () { node --experimental-default-type=module benchmark/bench.node.js $1> notebook/node/$1.csv; }
run raw-for-loop
run for-of-loop
run array-method
run generator