Demystifying Webpack

Adarsh - Sep 18 '18 - - Dev Community

All of us would've definitely used webpack at one point of time. It is by far the most popular bundler due to the endless amount of loaders and customisability it brings to the bundling process. In one way webpack has influenced the growth of certain JavaScript ecosystems. But how often we have thought of opening that bundled file and understand what has happened while bundling. How does my app which contains hundreds of individual files works so beautifully and cohesively from that one single bundled file? Let's breakdown the concepts of webpack and understand what happens during the bundling process. I won't be going over the elements in webpack configuration as they are mentioned in detail in webpack documentation itself instead it will be on the core concepts of webpack.

What is a bundler?

Before we go any further let's understand what is a bundler. A bundler is a utility/program that takes a number of files and puts them together in such a way it doesn't change how the code works. This allows you write code in a modular fashion but serve them as a monolith file.

Why do we need a bundler?

Increasingly nowadays keeping maintainability and reusability in mind we write code in modules. This modular style works fine if the application is small. But as applications scale in complexity and size it becomes difficult to manage the increasing number of dependencies and code while running this modularized code. For Example consider you are creating an HTML/JavaScript application that consists of 50 JS modules. Now from your HTML you cannot afford to have 50 script tags to use them in the page. This is where bundler kicks in, it bundles all those 50 files together and gives one file which you can use from your HTML with a single script tag.

Demystifying webpack

Okay enough of the basics let's dive into webpack now.

Consider the three files

// A.js

const B = require('./B');

B.printValue();
Enter fullscreen mode Exit fullscreen mode
// B.js

const C = require('./C.js')

const printValue = () => {
  console.log(`The value of C.text is ${C.text}`);
};

module.exports = {
  printValue,
};
Enter fullscreen mode Exit fullscreen mode
// C.js

module.exports = {
  text: 'Hello World!!!',
};
Enter fullscreen mode Exit fullscreen mode

And I defined A.js as my entry point for webpack and the output to be a single bundled file. When you run webpack build these two things happen.

  1. Form the dependency graph
  2. Resolve the dependency graph and Tree-Shaking

Form the dependency graph

The first thing webpack will do is analyze the modules that are present and form a dependency graph. A dependency graph is a directed graph that says how each module is connected to another module. It's quite popular among package managers such as npm, maven, snap etc. It starts off the from the entry point A.js and our graph initially looks like this with just one node.

Initial

Then webpack comes to know that B.js is being required by A.js and so it goes and creates a link from A to B in the graph.

Second

Now analysing B.js it figures out that it needs C.js as well. So again in the graph it creates a link from B to C.

Third

Now hypothetically if A.js requires another file called D.js which in turn requires C.js the graph becomes

Extra

See it's relatively simple stuff. Now in C.js webpack realises that it doesn't have any more modules as dependencies and so outputs the complete dependency graph.

Resolving the modules

Okay now, webpack has the graph and the modules. It must put all of them into one file and so it takes one node at a time from the graph starting from the root node A.js. It copies over the content of A.js to the output files, marks the node as resolved and then goes to the children of A.js. Suppose if the module which was already resolved earlier appears again it just skips it. Likewise it keeps adding content of the modules to the output file till it has finished traversing the dependency graph.

Tree-Shaking

Tree-Shaking is the process of removing dead code from the output. While webpack is creating the graph it also marks whether the module is used or not. If it's not used anywhere it removes the module as it's effectively a dead code. A point to note webpack does this in production mode only.

Let's take a look at the bundled code of the above three files.

/******/ (function(modules) { 
// webpackBootstrap 
/******/    // Load entry module and return exports
/******/    return __webpack_require__(__webpack_require__.s = 0);
/******/ })
/************************************************************************/
/******/ ([
/* 0 */
/***/ (function(module, exports, __webpack_require__) {

// A.js

const B = __webpack_require__(1);

B.printValue();

/***/ }),
/* 1 */
/***/ (function(module, exports, __webpack_require__) {

// B.js

const C = __webpack_require__(2)

const printValue = () => {
  console.log(`The value of C.text is ${C.text}`);
};

module.exports = {
  printValue,
};

/***/ }),
/* 2 */
/***/ (function(module, exports) {

// C.js

module.exports = {
  text: 'Hello World!!!',
};

/***/ })
/******/ ]);
Enter fullscreen mode Exit fullscreen mode

You can immediately recognise that it's an IIFE. The functions takes in a list of modules and executes the commands runs the code of each module. We can see that the first module is our entry file A.js second is B.js and the third is C.js. And we can see each of those modules are modified as functions that can be executed.

The module parameter is the replacement for default node module object. exports is the replacement for exports object and __webpack_require__ is the replacement for require used in our programs. The // webpackBootstrap contains the implementation of the function which is quite long. Let's just see the implementation of __webpack_require__

function __webpack_require__(moduleId) {
/******/
/******/        // Check if module is in cache
/******/        if(installedModules[moduleId]) {
/******/            return installedModules[moduleId].exports;
/******/        }
/******/        // Create a new module (and put it into the cache)
/******/        var module = installedModules[moduleId] = {
/******/            i: moduleId,
/******/            l: false,
/******/            exports: {}
/******/        };
/******/
/******/        // Execute the module function
/******/        modules[moduleId].call(module.exports, module, module.exports, __webpack_require__);
/******/
/******/        // Flag the module as loaded
/******/        module.l = true;
/******/
/******/        // Return the exports of the module
/******/        return module.exports;
/******/    }
Enter fullscreen mode Exit fullscreen mode

The code is quite simple to understand it takes in a moduleId and checks whether that module is present in installedModules cache. If it's not present it creates an entry in the cache. The next line modules[moduleId].call(module.exports, module, module.exports, __webpack_require__); actually executes the module function in the modules array which we passed earlier to the parent function. Comparing that to the fn.call() syntax we can deduce that module is the object created earlier, exports and this scope is the exports object of the created module object, and __webpack_require__ is the function itself. It then sets the module as loaded in the cache and returns the exports of the module.

That's all folks this is how webpack works on a fundamental level. There are still a lot more powerful things webpack does such as minimising the initial load by ordering modules in a particular way which I highly encourage you guys to go and explore.

It's always better to understand how a piece of utility works before we begin to use it. This helps writing better optimised code keeping in mind the inner workings and constraints of the utility we are using.

. . . . . . . . .