✨ What is this post about: As a part of my professional growth, I make time to watch conference talks on Ruby, Rails, JS, React, tech writing, and tech trivia. Previously, I'd just watch them but now I will take and publish notes for future reference. This talk was a part of RailsConf 2021 that I'm participating in at the time of writing.
✨ Talk: 'A Day in the Life of a Ruby Object' by Jemma Issroff
✨ One-paragraph summary: In this talk, we’ll walk through the lifespan of a Ruby object from birth to the grave: from .new to having its slot reallocated. We’ll discuss object creation, the Ruby object space, and an overview of garbage collection.
✨ If you can't watch the talk, see Jemmas blogs in the read more section
✨ Impression: Jemma is a great teacher who has made this talk beginner-friendly by explaining key concepts and providing great visuals. She's funny and smart, which makes the talk not overwhelming. This is what I'm all for: dense tech talks, especially by women technologists.
Table of contents
- Notes
- Terminology
- Object
- Incremental Garbage Collection: Tri-Color Mark and Sweep Algorithm
- Generational Garbage Collection
- Compaction
- Side notes
- Read more
Notes
- great opening anecdote: Jemma created 40,000,000,000 objects (which would be 40GB) on a computer that had only 16GB free space and the task was completed with no problem thanks to the gift of 🎁 garbage collection 🎁
- this was possible because we didn't ask Ruby to remember these objects (we were not referring to them) so Ruby followed a full object lifecycle
- we could see that the objects were created and then bulldozed by running
GC.stat(:count)
(GC -> Garbage Collector)
Terminology
- Operating System Heap: most of a machine's memory, which includes Ruby Heap
-
Ruby Heap: place in the machine's memory where our Ruby objects live and die
- it sometimes references Operating System Heap, it can sometimes change its size
- it's made of memory objects called pages (when the Ruby Heap is asking for more space, it asks in increments of pages)
-
Page: a unit in Ruby Heap
- each page has a header with some info and about 409 slots (each slot is 40 byte in size), which is exactly where we store the objects
- some slots have rvalues, which are Ruby's internal representations of an object
- in some cases, the value of an object is too long for the rvalue to hold and in that case, the rvalue will point to an external memory address in the Operating
- in some cases rvalues contain pointers to other rvalues
- Root RVALUE: values that the program will always know about that are vital to the running of the program
Object
-
#<Object:0x00007fd1c69b8058>
- the sequence of characters after "Object:" stands for the memory address; we can't access this object because we are not saving it to a variable so it's cleaned up - sometimes the size of the Ruby Heap is not representing the memory consumption of your program because of the external pointers (the boundary in case of strings is 23 and 24 characters); where the object is stored influences the processing time
Incremental Garbage Collection: Tri-Color Mark and Sweep Algorithm
- a really cool algo that Ruby uses in GC to determine which rvalues can have their slots reallocated (meaning, they are not important to the running of our program) <- Jemma's presentation visualizes this process in a wonderful way and maybe these next points won't be of much help without the visual
- it's tri-color (white, black, grey) and not bi-color because of "stop the world"
- Ruby pauses the execution of our programs to do garbage collection and it could get lengthy in bigger programs
- the grey allows us to stop the garbage collection (not to look at some rvalues) to allow our programs to run
- so, Ruby really uses "incremental Garbage Collection", which are just short intervals of GC when it picks up where we left off
Generational Garbage Collection
- weak generational hypothesis": "Most objects die young" 🪦
- we can manually trigger Garbage Collection by
GC.start
- it takes an optional parameter of
full_mark: false
, which determines if it's going to look for every object and in this case, it will only look at the young objects <- "Minor Garbage Collection" - to look for all the objects,
full_mark: true
<- "Major Garbage Collection"
- it takes an optional parameter of
- Major GC runs when the Minor GC hasn't freed up enough space
- If you want to see all the jobs that GC has done, you can run
GC.stat
Compaction
- rvlues are fragmented, they are not batched up together but instead spread across pages
- an algorithm pulls them all together into one page to take up less space
- because of that, the memory address may change, as visible below:
obj_array = 5.times.map{Object.new}
obj = Object.new
# => #<Object:0x00007fd1c69b8058>
obj_array = nil
GC.compact
obj
# => #<Object:0x00007fd1c69c9060>
Side notes
-
1_000_000_000.times {Object.new}
-> 40GB, because one object in Ruby is40 bytes