Continuing my series showing Advent of Code solutions in MiniScript, last night's challenge — Day 7 — was in interesting one. We're given a terminal session showing a bunch of cd
and ls
commands, from which we need to reconstruct a file hierarchy, and then search that for some aggregate directory sizes.
More specifically: for Part A of the challenge, we needed to report the sum of all directories containing less than 100000 bytes. And for Part B, we needed to report the size of the smallest directory that was at least big enough that, if deleted, it would provide a given amount of free space on the disk.
A file hierarchy is clearly a tree structure, but for some reason, I quickly abandoned making an actual tree of little node objects for a different approach. But after completing the challenge and chatting with my spouse — who is doing these challenges in Chapel — I couldn't shake the feeling that I had done it the hard way.
So today, I'll share both solutions!
Solution #1: Directory Map
The data structure I actually chose last night was a map in which each key was a complete directory path, and the corresponding value was a list of [name, size] pairs. In the case of a directory, the pair was actually [name, "dir"] instead.
As I read the input file, I kept track of the current directory path as a string, using Mini Micro's built-in file module to find child and parent paths as needed. Then to find the total size of a directory, I wrote a recursive dirSize
function that added up all the sizes of its children (calling itself in the case of any entry with "dir" for a size). A similar recursive function was used to find all the small directories.
The complete program — after a little morning-after cleanup — comes to 89 lines, including both parts of the challenge.
f = file.open("input.txt")
//f = file.open("example.txt")
contents = {} // key: path; value: list of [name, size]
curDir = "/"
curEntries = []
contents[curDir] = curEntries
while not f.atEnd
line = f.readLine
if line == null then break // end of file
print line
m = line.match("$ cd {subdir}")
if m then
if m.subdir == "/" then
curDir = "/"
else if m.subdir == ".." then
curDir = file.parent(curDir)
else
newDir = file.child(curDir, m.subdir)
curDir = newDir
end if
if contents.hasIndex(curDir) then
curEntries = contents[curDir]
else
curEntries = []
contents[curDir] = curEntries
end if
else if line == "$ ls" then
continue
else
// file line
sizeOrDir = line.split(" ")[0]
name = line.split(" ")[1]
if sizeOrDir == "dir" then
curEntries.push [name, "dir"]
else
curEntries.push [name, sizeOrDir.val]
end if
end if
end while
f.close
dirSize = function(path)
entries = contents[path]
sum = 0
for e in entries
if e[1] == "dir" then
sum = sum + dirSize(file.child(path, e[0]))
else
sum = sum + e[1]
end if
end for
return sum
end function
resultA = []
findSmallDirs = function(path)
for e in contents[path]
if e[1] == "dir" then
subpath = file.child(path, e[0])
sz = dirSize(subpath)
if sz <= 100000 then
print "Small dir: " + subpath + " of size: " + sz
resultA.push sz
end if
findSmallDirs subpath
end if
end for
end function
findSmallDirs "/"
print "final answer (part A): " + resultA.sum
diskSize = 70000000
spaceUsed = dirSize("/")
curFree = diskSize - spaceUsed
spaceRequired = 30000000
needed = spaceRequired - curFree
print "Need to find " + needed + " bytes"
// Now find smallest dir that's at least needed bytes
best = null
for d in contents.indexes
sz = dirSize(d)
if sz < needed then continue
if best == null or sz < best then best = sz
end for
print "final answer (part B): " + best
It worked. But it wasn't super fast to code; it took me 28 minutes, the longest time yet. And my final rank was 1016, not quite reaching my goal of the top 1000.
Solution #2: A Tree of Nodes
So this morning, I coded up the other approach, of making a proper tree. I defined a Node class, which would be both the base class and the class instantiated for any file; and a DirNode subclass, which represents a directory. DirNodes keep a map of their children (from name to Node). We need to be able to get from a child to its parent, in order to process the cd ..
commands in the input, so I gave DirNode
an addChild
method that would both add the child to its map, and set the child's parent
to itself.
Finally, while a file Node would have a size
member set to its own size, DirNode.size
was assigned a function that added up the sizes of all its children. (Notice that in MiniScript, it's perfectly fine for a subclass to override a simple value with a function, or vice versa.)
The code for reading and processing the input came out very similar. The subsequent code was simpler, though. Given any node, we can simply check its .size
, and not worry about whether it's a file or a directory. (However I did find I had no quick way to find all the directory nodes; so I ended up adding an allDirs
list, and pushing each new DirNode onto that as it is made.)
The complete program comes to 87 lines — almost the same as last night's version.
f = file.open("input.txt")
//f = file.open("example.txt")
Node = {}
Node.name = ""
Node.size = 0
DirNode = new Node
DirNode.children = null // map from name to Node
DirNode.addChild = function(node)
self.children[node.name] = node
node.parent = self
end function
DirNode.size = function
sum = 0
for c in self.children.values
sum = sum + c.size
end for
return sum
end function
root = new DirNode
root.name = ""
root.children = {}
curDir = root
allDirs = []
while not f.atEnd
line = f.readLine
if line == null then break // end of file
print line
m = line.match("$ cd {subdir}")
if m then
if m.subdir == "/" then
curDir = root
else if m.subdir == ".." then
curDir = curDir.parent
else
curDir = curDir.children[m.subdir]
end if
else if line == "$ ls" then
continue
else
// file line
sizeOrDir = line.split(" ")[0]
name = line.split(" ")[1]
if sizeOrDir == "dir" then
noob = new DirNode
noob.children = {}
allDirs.push noob
else
noob = new Node
noob.size = sizeOrDir.val
end if
noob.name = name
curDir.addChild noob
end if
end while
f.close
// find sum of small dirs
resultA = []
for e in allDirs
sz = e.size
if sz <= 100000 then
print "Small dir: " + e.name + " of size: " + sz
resultA.push sz
end if
end for
print "final answer (part A): " + resultA.sum
diskSize = 70000000
spaceUsed = root.size
curFree = diskSize - spaceUsed
spaceRequired = 30000000
needed = spaceRequired - curFree
print "Need to find " + needed + " bytes"
// Now find smallest dir that's at least needed bytes
best = null
for d in allDirs
sz = d.size
if sz < needed then continue
if best == null or sz < best then best = sz
end for
print "final answer (part B): " + best
Conclusion
In the end, the two approaches were of similar length and complexity. The node approach feels cleaner somehow, but when actually doing the challenge for time, all that matters is the fastest (not always the cleanest) path to an answer. Would the tree approach have been faster? Hard to say.
As always, I come away from one of these challenges thinking about what sort of library code could have made it faster. Perhaps some methods for searching, filtering, and applying-methods-to trees? With those in hand, the tree approach would certainly have been faster than my custom map-to-lists code.
Once again, the Advent of Code challenge was a fun and thought-provoking exercise. I look forward to the days ahead!