Photo by Jilbert Ebrahimi on Unsplash
In our latest article we've been through bytecodes! Now let's see something a little more in-depth!
Garbage Collection
There was a time when humans needed to write code thinking about memory management, but, as years went by, we don't need to worry about this anymore. This is due to one magical tool called Garbage Collector (GC).
Garbage collection is a common practice for memory management in most languages. The only job of a GC is to reclaim the memory that is being occupied by unused objects. It was first used in LISP in 1959.
But how does it know when an object is not used anymore?
Memory Management in Node.js
Since we do not need to worry about memory anymore, it's fully handled by the compiler. So, memory allocation is automatically done when we need to allocate a new variable and it's automatically cleaned up when this memory is no longer needed.
The way GC knows when objects are no longer used is by their references or how they reference each other. When an object is not referencing nor being referenced by any other object, it is garbage collected. Take a look at this diagram:
You can see there are a few objects referencing and referenced, but there are two objects which are not being referenced or referencing anyone. So these will be deleted and their memory reclaimed. This is the diagram after the GC sweep:
The downsides of using garbage collectors are that they might have a huge performance impact and might have unpredictable stalls.
Memory management in practice
In order to show how memory management works, let's take a simple example:
function add (a, b) {
return a + b
}
add(4, 5)
We have a few layers we need to know:
- The Stack: The stack is where all local variables, pointers to objects or application control flow. In our function, both parameters will be placed on the stack.
-
The heap: The heap is the part of our program where reference type objects are stored, like strings, or objects. So the
Point
object below will be placed on the heap.
function Point (x, y) {
this.x = x
this.y = y
}
const point1 = new Point(1, 2)
If we take a look at the memory footprint in the heap, we would have something like this:
root -----------> point1
Now let's add some other points:
function Point (x, y) {
this.x = x
this.y = y
}
const point1 = new Point(1, 2)
const point2 = new Point(2, 3)
const point3 = new Point(4, 4)
We'd have this:
|-------------------> point1
root |-------------------> point2
|-------------------> point3
Now, if the GC would run, nothing would happen, since all our object store references to the root object.
Let's add some objects in the middle:
function Chart (name) {
this.name = name
}
function Point (x, y, name) {
this.x = x
this.y = y
this.name = new Chart(name)
}
const point1 = new Point(1, 2, 'Chart1')
const point2 = new Point(2, 3, 'Chart2')
const point3 = new Point(4, 4, 'Chart3')
Now we would have this:
|-------------------> point1 ----> Chart1
root |-------------------> point2 ----> Chart2
|-------------------> point3 ----> Chart3
Now, what would happen if we set our point2
to undefined
?
|-------------------> point1 ----> Chart1
root | point2 ----> Chart2
|-------------------> point3 ----> Chart3
Notice that, now, the point2
object cannot be reached from the root object. So, at the next GC run it would be eliminated:
|-------------------> point1 ----> Chart1
root
|-------------------> point3 ----> Chart3
This is basically how GC works, it walks the root to all objects, if there are any objects in the object list which has not been accessed by the walk, then it cannot be accessed by the root, so it'd be removed.
GC can happen in different methods.
GC Methods
There are many methods to handle GC.
New Space and Old Space
This is the method Node.js uses.
The heap has two main segments: the new space and the old space. The new space is where allocations are actively happening; this is the fastest place where we could collect garbage, the new space is about 1 to 8 MBs. All objects in the new space are called the young generation.
In contrast, the old space is where the objects that survived the last garbage collection resides, in our case, the point1
and point3
objects are in the old space. They are called the old generation. Allocation in the old space is pretty fast, however, GC is expensive, so it's hardly ever performed.
But, hardly 20% of the young generation survives and it's promoted to the old generation, so this old space sweep does not actually need to be done very often. It's only performed when this space is getting exhausted, which means around 512mb, you can set this limit with the --max-old-space-size
flag in Node.js. To reclaim the old space memory, the GC uses two different collection algorithms.
Scavenge and Mark-Sweep Collection
The scavenge collection is fast and runs in the young generation, while the mark-sweep collection method is slower and runs on the old generation.
Mark & Sweep algorithm works in just a few steps:
- It starts with the root object. Roots are global variables that get referenced in the code. In JS this may be either the
window
object or, in Node, theglobal
object. The complete list of all those roots is built by the GC. - The algorithm then inspects all roots and all their children, marking each one as active - so that means they're not garbage yet - logically, anything else the root cannot reach will not be marked active, which means: garbage
- After that, all non-active objects are freed.
Conclusion
We're one article away from having our series ended! In this article, we've discussed the memory handling and garbage collection, in the next one, we'll discuss how the compiler optimizes the whole code! Stay tunned!