Introduction
In this article, we'll be exploring a curious way to use Zig's build system as well as its metaprogramming capabilities to turn the whole thing into a sort of compiler for another language (in this case BF).
The technique is as follows:
- Use the build system to expose the source code of a BF program to the main executable.
- Parse the source code at compile time into an AST.
- Use the AST to generate the necessary code, effectively transpiling it to Zig.
- Let the build system compile the generated code into an executable.
- Profit!
Step 1: Expose the source code
We can do this inside build.zig by taking the name of the source file as a command line argument and passing its contents to the executable via an option:
...
const source_file_path = blk: {
if (b.args) |args| if (args.len > 0) {
break :blk args[0];
};
return error.MissingArgumentInputFile;
};
var source_file = try std.fs.cwd().openFile(source_file_path, .{
.mode = .read_only,
});
defer source_file.close();
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const source = try source_file.readToEndAlloc(allocator, std.math.maxInt(usize));
defer allocator.free(source);
const options = b.addOptions();
options.addOption([]const u8, "source", source);
exe.root_module.addOptions("options", options);
...
Step 2: Parsing the source code
This also turned out to be pretty straightforward to do. Note that the code below isn't particularly fast. I'm sacrificing that for ease of implementation. That's "fine though"™ as it will only run at compile time:
const BfExpr = union(enum) {
incAddr,
decAddr,
incMem,
decMem,
inp,
out,
loop: BfAst,
};
const BfAst = []const BfExpr;
/// Caveats:
/// Stops parsing after the first unmatched ']'
/// Implicitly adds missing ']' at the end of file
fn comptimeParseBfAst(source: *[]const u8) BfAst {
var bfAst: BfAst = &.{};
while (source.len > 0) {
const command = source.*[0];
source.* = source.*[1..];
switch (command) {
'>' => bfAst = bfAst ++ &[1]BfExpr{.incAddr},
'<' => bfAst = bfAst ++ &[1]BfExpr{.decAddr},
'+' => bfAst = bfAst ++ &[1]BfExpr{.incMem},
'-' => bfAst = bfAst ++ &[1]BfExpr{.decMem},
'.' => bfAst = bfAst ++ &[1]BfExpr{.out},
',' => bfAst = bfAst ++ &[1]BfExpr{.inp},
'[' => bfAst = bfAst ++ &[1]BfExpr{.{ .loop = comptimeParseBfAst(source) }},
']' => return bfAst,
else => {},
}
}
return bfAst;
}
Step 3: Generate the runtime code
Now that we have the AST generating the runtime code is a no-brainer:
inline fn genProgram(
comptime bfAst: BfAst,
memory: []u8,
addressPointer: *u16,
input: anytype,
output: anytype,
) void {
inline for (bfAst) |bfExpr| switch (bfExpr) {
.incAddr => addressPointer.* +%= 1,
.decAddr => addressPointer.* -%= 1,
.incMem => memory[addressPointer.*] +%= 1,
.decMem => memory[addressPointer.*] -%= 1,
.out => output.writeByte(memory[addressPointer.*]) catch {},
.inp => {
const old_memory = memory[addressPointer.*];
memory[addressPointer.*] = input.readByte() catch old_memory;
},
.loop => |loopBody| {
while (memory[addressPointer.*] != 0) {
genProgram(loopBody, memory, addressPointer, input, output);
}
},
};
}
Step 4: Putting it all together
Finally, we can use everything above to write the main function:
const std = @import("std");
const options = @import("options");
pub fn main() !void {
var memory = [_]u8{0} ** (1 + std.math.maxInt(u16));
var address_pointer: u16 = 0;
const stdin = std.io.getStdIn().reader();
var br = std.io.bufferedReader(stdin);
const input = br.reader();
const stdout = std.io.getStdOut().writer();
var bw = std.io.bufferedWriter(stdout);
const output = bw.writer();
@setEvalBranchQuota(2000000000);
comptime var source = options.source;
const bfAst = comptime comptimeParseBfAst(&source);
genProgram(bfAst, &memory, &address_pointer, input, output);
try bw.flush();
}
Step 5: Profit!
Now we can use this project to compile BF programs with the following command:
zig build -Doptimize=ReleaseFast -- program.b
Pros/Cons
Pros:
- All the benefits of directly transpiling to a different language:
- No need to know/write assembly.
- Automatic portability/optimization.
- Additionally, no need to generate source files directly, as everything is done within Zig.
Cons:
- Again, basically all the downsides of directly transpiling to a different language:
- Slow compilation time compared to a hand-written compiler.
- Dependence on the target language (zig in this case).
- Additionally, the fact that the transpilation process happens inside Zig's comptime context means that we cannot have very fine control over memory or time resources during that period. The comptime just wasn't designed for that.
Conclusion
While a fun way to play around with Zig's comptime capabilities, the downsides of this technique, in the opinion of this coder, outweigh the benefits in anything more serious than a recreational project. Still, if you want to play around with it, you can find all the code here. Cheers!