A treatise on JavaScript dependencies

Zach Schneider - Aug 18 '20 - - Dev Community

JavaScript dependency trees are a bit of a punching bag in the programming world. Even in a small project, the node_modules directory can easily reach hundreds of megabytes in size, much to the chagrin of engineers who remember the days when an entire hard drive might not even hold 100MB. A brand new create-react-app project comes with 237MB of node_modules at the time of this writing. There are even memes about this phenomenon:

Heaviest objects in the universe: sun, neutron star, black hole, node_modules

As you might expect, the topic also comes up regularly in discussion forums. A recent Hacker News thread wondered why a new Rails app (with a webpack toolchain) brings along 106MB in JavaScript dependencies. So what gives? Do JavaScript programmers just love installing libraries? To answer this question, we need to start with a bit of recent history.

The JavaScript standard library

If you were programming for the web in 2016, you probably recall the infamous left-pad fiasco. TL;DR: an engineer who was unhappy with npm decided to unpublish all of his packages in protest. One of these packages, left-pad, was an 11-line helper to pad a string with spaces up to a certain length. This package was very commonly used (whether as a direct dependency or an indirect dependency-of-a-dependency) and thus broke a lot of popular packages and application builds, causing much weeping and gnashing of teeth. npm implemented some limitations on unpublishing packages to prevent the situation from recurring in the future, but the issue shined a spotlight on a broader problem in the JavaScript world — why did hundreds of packages depend on a tiny dependency to pad a string?

The problem really starts with JavaScript's standard library — especially its standard library of 5-10 years ago. When encountered with a solved-but-sort-of-tricky problem like string padding, programmers will naturally take the path of least resistance, which usually involves Googling a solution. They're focused on solving bespoke business-logic problems and rarely want to go down the rabbit trail of writing a custom string manipulation library. A ruby programmer would quickly discover the built-in rjust method on strings, a python programmer would discover the identically-named python equivalent, and a PHP programmer would find the helpful str_pad function. But a JavaScript programmer in 2016 would have found... the left-pad library. JavaScript didn't have a built-in way to pad a string. Nor did it offer numerous other convenience functions that we often take for granted in other languages. The existence of underscore and lodash is evidence in itself — packages containing dozens of convenience functions that come for free in the standard library of most high-level languages.

Now, this piece of the problem has improved substantially since 2016. If you search how to left-pad a string in JavaScript today, you're quickly pointed to the built-in padStart function, available in Node.js >8 and all modern browsers (but not Internet Explorer). The TC39 committee has done an excellent job of adding language features that fill the gaps previously plugged by one-off helper packages. However, inertia is still a confounding factor, as somebody has to do the work of removing helper packages and refactoring to built-in language features. And adopting these new language features requires dropping support for older versions of Node.js (which may be technically unsupported but are still broadly used in practice).

Building atop the rubble

The support matrix is even choppier for web applications. The aforementioned padStart function doesn't exist in Internet Explorer 11, and neither do most of the other convenience features added in ES6/ES7. Safari 13 lacks support for BigInt and requestIdleCallback. Edge has caught up a lot since its switch to the Blink rendering engine, but pre-Blink Edge didn't support setting scroll positions on elements or array flat/flatMap. Most modern features work in most modern browsers, but you'll still spend a lot of mental cycles making sure nothing slips through the gaps, especially if you need to support IE11.

Fortunately, there's a pretty robust toolchain for using the latest language features in web applications while maintaining support for older browsers. It goes something like this:

  • webpack combines your source code into shippable bundles, runs each file through loaders to perform any necessary transpilation, and also handles extras like minification.
  • Babel transpiles JavaScript to remove syntax that's unsupported in older browsers (for example, arrow functions are turned into regular functions to avoid breaking IE11). Babel can also handle polyfilling language features that you depend on, using...
  • core-js provides implementations of recent language features — array/string convenience methods, completely new built-in objects like Proxy, and more. Babel can automatically detect which language features are used in your code and hook up the appropriate core-js implementation.
  • Browserslist is a standardized configuration format to specify which browsers you want to support. It can accept literal versions like Internet Explorer 11 or queries like >1% (browser versions with more than 1% global usage), last 3 Chrome versions, etc.
  • caniuse-lite is a database showing which features are supported by which browsers; it's used by Babel and other tools to determine what needs to be polyfilled to support the browsers you've requested.

With this toolchain in place, you can happily write JavaScript using the latest language features and not worry about browser support, which is great for productivity and provides a good end-user experience as well. But it comes at a cost — the packages listed above and more end up in your node_modules, and they aren't small. Webpack itself is 2.7MB, core-js is something like 7MB, Babel and its accessory packages come in at around 10MB, and caniuse-lite is 3.2MB worth of data — it adds up. And there's nothing really egregious here in a vacuum; it's unsurprising, for example, that the implementations of hundreds of modern JavaScript language features collectively weigh 7MB. But it's certainly a major contributing factor to the overall size of the average node_modules. We've traded an eye-opening amount of disk space for a great developer workflow and a consistent experience for end users.

Packages on packages

Did you know that either npm or yarn will happily install multiple versions of the same package? Imagine you've got package A and package B in your dependencies list. Both A and B depend on package C but with incompatible version requirements. In ruby, this produces an installation error and you're left to work out a consistent dependency tree on your own. npm and yarn, on the other hand, will happily install multiple versions of package C. They accomplish this by giving packages A and B each their own nested node_modules folder containing their desired version of C. JavaScript dependencies are resolved by ascending the filesystem to find the closest node_modules, so packages without conflicts can be deduped to the top level while conflicted packages are kept in nested directories.

There are certainly some benefits to this approach. I have spent many long hours working through version conflicts in ruby, where seemingly unrelated gems demand inconsistent versions of a shared dependency. But this approach inevitably results in a lot of duplicate packages, and there's also not much you can do about it. To some extent, this behavior is a necessary consequence of an ecosystem with a greater reliance on helper packages. It would be hellacious trying to get dozens of packages to agree on the same set of helper versions; it's bad enough in ruby where only a few packages are usually in conflict. Regardless, duplicate package versions should be kept in the back of your mind when trying to understand node_modules bloat.

So where does that leave us?

Hopefully, this article leaves you with a better sense of how we got here and where the ecosystem is headed. To a large extent, I expect the scope of the problem to recede on its own as the new and more robust standard library features gain broad support and replace obsolete helper packages. But it's a naturally slow process that's rendered even slower by inertia and by the need for tooling to support legacy browsers. As a JavaScript engineer, the best way to speed the process along is by learning and spreading awareness of the latest and greatest features in the standard library. You could even send pull requests upstream if you find that you're using a package that pulls in a lot of obsolete helpers. npm ls and npm why (or yarn list and yarn why) are great aides in learning about your dependency tree and where each package is coming from.

The last thought I'll leave you with is this: don't stress too much about it. Be honest — when was the last time that you spent even a few minutes dealing with a problem caused by 100MB of used hard drive space? I'm fairly certain that I've invested more brain cycles writing this article than I've ever spent on that particular class of problem. It feels wrong and can be hard to stomach, especially if you were programming in a time when hard drive space was at a premium. But it's just not that big of an issue in practice, and it's a problem that's easily solved if it does arise by spending a fairly negligible amount of money. As with any issue, you're best served focusing your mental energy where it creates the most leverage, which is usually solving hard business problems to provide value to your end users.

. . . . . . . . . . . . . . . . . . . . . .