This blog post originally started off as another one of my Unix command deep dives (remember those?), where I dive into the internals of a common Linux command. I was trying to run strace
to determine the system calls that were invoked by the command that I was exploring when I recalled that I was on macOS and that macOS did not have strace
but instead used a tool called dtruss
to track syscalls invoked during the execution of a program.
Now, I've been relatively ignorant about the distinction between strace
and dtruss
before. All I knew was that dtruss
did what I needed it to do and I didn't much bother looking into the details of how it worked or what it was.
But today is the day to shed the cloak of ignorance, friends!
What is strace?
strace
is a system call tracer and also one of the few things in tech that has a name that reasonably matches what it does. You might be familiar with strace
from Julia Evans' strace zine. I think Julia's zine is a great way to learn about strace
but here's my two point summary on what strace is.
- System calls are an interface that allows a program to request some functionality from the operating system. These system calls do things like changing the current working directory, changing the permissions on files, and so on. You can view a full list of system calls here.
-
strace
lists out the system calls that a program invokes as it executes.
One thing that the zine doesn't going into is how strace
works under the hood. I'll dive into that here. Under the hood, strace
leverages ptrace
, which stands for process trace, a system call that allows a parent process to watch and control the execution of a child process. It's used in strace
, but it also enables things like the gdb
debugger. The ptrace
system call uses some internal Linux data structures to establish a relationship between the tracer (the parent process) and the traced (the child process). Whenever a system call is invoked in the traced process, the tracer will be notified of the system call and the traced process will be temporarily stopped. At this point in time, whatever program is invoking ptrace
, whether it is strace
or gdb
, will process the information about the system call it was notified of and then return control back to the child process. This jumping back and forth between a child process, ptrace, and a higher level program highlights one of the downfalls of strace
. Because the operating system has to switch contexts between several processes repeatedly, strace
is not that fast.
In summary, ptrace
acts as a mediator between the running process and a higher level tool such as gdb
or strace
.
What is dtrace?
Now, this is where I had to do a little bit of research. The first definition I found of dtrace
was on Brendan Gregg's website which defined dtrace
, or I guess I can call it DTrace, as "an implementation of dynamic tracing." What is dynamic tracing? I had to do quite a bit of digging to find a resource that explained this well. In the end, I came across this article, which helped me grok what was going on.
Whereas strace
relies on ptrace
to introspect processes, dtrace
goes about things a little bit differently. With dtrace
, the programmer writes probes in a language with a C-like syntax called D. These probes define what dtrace
should do when it invokes a system call, exits a function, or whatever else you'd like. These probes are stored in a script file that looks something like this.
syscall::read:entry {
printf("read has been called.");
}
This script states that whenever the read
system call is invoked, the tracer should print out the string "read has been called." The script file is then invoked with dtrace like so.
$ dtrace -s my_probe.d
dtrace
then invokes the logic within the probe whenever it runs to the event outlined in that probe (entering a certain system call or exiting a function and so on). This flexibility lends DTrace its title as a dynamic tracer.
What is dtruss?
The next thing I set out to uncover was what dtruss
was. The first definition I ran into was from the dtruss
manpage which defined dtruss
as a "a DTrace version of truss." Well, I guess I better figure out what truss is first then. As it turns out, truss
is a Unix-specific command that allows the user to print out the system calls made by a program. It's essentally a varient of the strace
tool that exists on Linux. Knowing this, I think the best way to describe it would be to use an analogy: strace
is to dtrace
as truss
is to dtruss
.
What other tracing tools exist?
Now, as it turns out, strace
and dtrace
aren't the only tools in our toolkit of tracers. My investigation eventaully led me to explore the wider world of tracers. As it turns out, Julia comes to the rescue once again. Brendan Gregg has another blog post with a list of different Linux tracers, how they work, and when you can use them. Brendan seems like quite the authority figure in this space, having published several books on tracing and written many nice blog posts. If you're interested in diving more into this, I would recommend checking out some of his blog posts.
Conclusion
Well, wasn't that a fun slide down the iceberg. It's always pretty fun when you start by posing a simple question (what is the difference between strace and DTrace) and end up discovering something much bigger (a whole new world of tracers).
What tracer do you use on a regular basis? Is there a particular tracing tool that you prefer over others?