LLDB Custom Data Formatters for C in Python

Paul J. Lucas - Nov 2 '20 - - Dev Community

Introduction

If you use LLDB to debug C or C++ programs, you can customize the output of LLDB’s print (p) command to print the contents of variables in a more enlightening way than the default. You can even use Python to do it. However, pretty much all the examples I could find showed only relatively trivial data structures. After lots of searching and trial-and-error, I’ve figured out how to use Python to print a non-trivial struct.

Data Structures

The C data structures that I want to print are from cdecl and include the following to implement a singly linked list (somewhat trimmed for this example):

typedef struct slist      slist_t;
typedef struct slist_node slist_node_t;

struct slist {
  slist_node_t *head;           // Pointer to list head.
  slist_node_t *tail;           // Pointer to list tail.
};

struct slist_node {
  slist_node_t *next;           // Pointer to next node or null.
  void         *data;           // Pointer to user data.
};
Enter fullscreen mode Exit fullscreen mode

This is then used to implement storing C++ scoped names, e.g., S::T:

typedef struct slist c_sname_t; // C++ scoped name.
Enter fullscreen mode Exit fullscreen mode

where each scope’s name is stored as an slist_node where data is actually a c_scope_data* where a c_scope_data contains data about each scope, in particular, its name:

struct c_scope_data {
    char const *name;           // The scope's name.
    // ...
};
Enter fullscreen mode Exit fullscreen mode

For S::T, there would be one slist with two nodes: one for S and the second for T.

When debugging in LLDB and printing a variable of type c_sname_t, I want LLDB to print the entire scoped name S::T, i.e., traverse the list printing every node’s name and not just the default of printing the head and tail pointers of the slist itself.

LLDB Python Modules

To do this, I implemented an LLDB Python module. Such a module starts with:

# cdecl_lldb.py

import lldb

def __lldb_init_module(debugger, internal_dict):
    cmd_prefix = 'type summary add -F ' + __name__
    debugger.HandleCommand(cmd_prefix + '.show_c_sname_t c_sname_t')
Enter fullscreen mode Exit fullscreen mode

The __lldb_init_module function is called once by LLDB to initialize a module where debugger is the instance of LLDB itself and internal_dict is required by the signature, but you never need to interact with it. You can use the function to bind C types to custom formatters.

The lines:

    cmd_prefix = 'type summary add -F ' + __name__
    debugger.HandleCommand(cmd_prefix + '.show_c_sname_t c_sname_t')
Enter fullscreen mode Exit fullscreen mode

declare cmd_prefix as a shorthand to be used on subsequent lines (where __name__ is the name of the current Python file without the .py extension) and HandleCommand binds a formatter (-F) to the show_c_sname_t function for the type c_sname_t. (Yes, typedefd names are fine.)

Before we get to the implementation of show_c_sname_t, we’ll need a utility function to help traverse the data structure:

def null(ptr):
    """Gets whether the SBValue is a NULL pointer."""
    return not ptr.IsValid() or ptr.GetValueAsUnsigned() == 0
Enter fullscreen mode Exit fullscreen mode

(An SBValue is an LLDB data structure that stores the contents of a variable, register, or expression.)

The implementation of show_c_sname_t starts with:

def show_c_sname_t(c_sname, internal_dict):
    colon2 = False          # Print "::" scope separator?
    rv = ""                 # "string-ified" value to return
Enter fullscreen mode Exit fullscreen mode

Since a c_sname_t is a typedef for an slist, we first need to get at the list’s head, then traverse the list. As it happens, SBValue has a linked_list_iter() function that can be used to traverse a linked list provided the name of the struct member containing the next pointer:

    head = c_sname.GetChildMemberWithName('head')
    for slist_node_ptr in head.linked_list_iter('next'):
Enter fullscreen mode Exit fullscreen mode

For each node, we have to get its data member, then cast the void* to a c_scope_data_t*. To do the cast, we need to get a hold of the c_scope_data_t* type from within Python. That can be done once prior to the loop with:

    target = lldb.debugger.GetSelectedTarget()
    c_scope_data_ptr_t = target.FindFirstType('c_scope_data_t').GetPointerType()
Enter fullscreen mode Exit fullscreen mode

The complete remainder of the function is now:

    head = c_sname.GetChildMemberWithName('head')
    for slist_node_ptr in head.linked_list_iter('next'):
        void_data_ptr = slist_node_ptr.GetChildMemberWithName('data')
        if not null(void_data_ptr):
            c_scope_data_ptr = void_data_ptr.Cast(c_scope_data_ptr_t)
            name_ptr = c_scope_data_ptr.GetChildMemberWithName('name')
            if not null(name_ptr):
                if colon2:
                    rv += '::'
                else:
                    colon2 = True
                rv += name_ptr.GetSummary().strip('"')

    return '"' + rv + '"'
Enter fullscreen mode Exit fullscreen mode

Given name_ptr, we use GetSummary() to get the actual name string. GetSummary() puts quotes around its result that we must strip because we want quotes around the entire return value, not each scope’s name.

Loading a Module

To get LLDB to load the module every time it starts, I first created an .lldbinit file in cdecl’s src directory:

# .lldbinit
command script import cdecl_lldb.py
Enter fullscreen mode Exit fullscreen mode

In order to get LLDB to load it, you also need to allow loading of .lldbinit files from the current working directory. To do that, add:

settings set target.load-cwd-lldbinit true
Enter fullscreen mode Exit fullscreen mode

to your global ~/.lldbinit file.

Using Python Interactively

While trying to come up with the correct Python, rather than having to edit your script, save it, launch LLDB, set a breakpoint, and run your program every time (the “edit-compile-run loop”), you can drop into a Python interpreter from within LLDB and use it to do trial-and-error to determine the right Python code to print a variable the way you want it. For example, given a function that contains:

c_sname_t const *sname = find_name( /*...*/ );
do_something( sname );
Enter fullscreen mode Exit fullscreen mode

you can set a breakpoint on do_something() and run your program. When the breakpoint is hit and you’re dropped into LLDB, you can further drop into a Python interpreter via the script command:

(lldb) script
Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
>>>
Enter fullscreen mode Exit fullscreen mode

The first thing you have to do is get a hold of the sname variable from Python, more specifically from within the current stack frame:

>>> sname = lldb.frame.FindVariable('sname')
Enter fullscreen mode Exit fullscreen mode

Now, you can try out Python code interactively shortening the edit-compile-run loop.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .