Visualize C++ Data Structures using Graphviz and the DOT language

Data struc­tures help struc­ture and or­ga­nize data ef­fec­tively, and pro­vide sev­eral ab­stracted op­er­a­tions on the data. They are el­e­gant and con­ve­nient, and com­puter sci­en­tists love to use them. Implementing these data struc­tures in the com­puter re­quire the pro­gram­mer to flat­ten the struc­ture into a one-di­men­sional model us­ing point­ers or ref­er­ences be­cause the mem­ory is ac­tu­ally arranged as a one-di­men­sional run of data el­e­ments.

It is of­ten nec­es­sary to in­spect the data struc­ture in de­bug­ging and pro­gram ver­i­fi­ca­tion phases. But some data struc­tures hap­pen to be par­tic­u­larly graph­i­cal, in that they have mul­ti­ple as­so­ci­a­tions and elab­o­rate hi­er­ar­chies or lay­ers. Think of a Rope or a Graph. Due to many as­so­ci­a­tions be­tween nodes, it’s not pos­si­ble to prop­erly dis­play the graph on the ter­mi­nal.

I re­cently ran into such prob­lem, and found an in­ter­est­ing so­lu­tion. In this post, I will try to doc­u­ment it. I’ve writ­ten mainly about the Abstract Syntax Tree in this post, but it should be triv­ial to trans­late it to any other de­ter­min­is­ti­cally tra­vers­a­ble data struc­ture.

I’m cur­rently work­ing on writ­ing a com­piler for my own Nepali pro­gram­ming lan­guage. The front-end part of a com­piler con­verts the pro­gram code into a data struc­ture called AST. AST, or Abstract Syntax Tree, is a tree-based rep­re­sen­ta­tion of the source code. It’s much more suit­able to fur­ther trans­for­ma­tions in the se­man­tic analy­sis, op­ti­miza­tion and code gen­er­a­tion phases.

This pic­ture of Abstract Syntax Tree taken from Wikipedia page on AST

But AST is also an in­cred­i­bly com­plex data struc­ture. It has to store a bunch of in­for­ma­tion and main­tain a tree struc­ture. It is not im­me­di­ately ob­vi­ous how one might pre­sent it on the ter­mi­nal screen. Several ideas to ex­ist, but they don’t scale well. Two so­lu­tions that I thought up were:

  1. Print in the terminal
  2. Unparse the AST back to source code.

Unparse the AST back to source code

The AST is a loss­less trans­for­ma­tion of the source code as far as in­tent is con­cerned. So it should be pos­si­ble to go the other di­rec­tion and re­trieve the source code back from the AST. If the AST does­n’t have any er­rors, then it’s un­pars­ing should re­sult in code that’s nearly iden­ti­cal to the orig­i­nal source code. This step is also es­sen­tial when try­ing to ver­ify com­piler cor­rect­ness.

I’m go­ing to have to ex­plore this op­tion fur­ther in the fu­ture. But be­cause this method does­n’t al­low me to di­rectly ob­serve the AST at sev­eral stages of the com­pi­la­tion process, I did­n’t use this.

Print in the terminal

I had writ­ten a com­piler for my mi­nor pro­ject in my 6th se­mes­ter but the lan­guage it ac­cepted was very ba­sic. So, back then, I got away with sim­ply print­ing the tree in the ter­mi­nal. For ex­am­ple, this code

gen­er­ates this AST:

The AST is printed in the ter­mi­nal ro­tated 90° counter-clock­wise such that the root of the tree is at the bot­tom-left. This was served me well at the time. Frankly, for a lan­guage that did­n’t even have an else if clause, there’s are lim­ited ways to com­pli­cate the AST other than by writ­ing longer code.

But the lan­guage I’m work­ing on right now is ex­ten­sive. And so the tree it cre­ates is im­mense. I had to come up with some other way to analyse the AST.

Enter Graphviz

Brian Kernighan has given an amaz­ing talk on suc­cess­ful lan­guage de­sign, where he also talked about Pic, a lan­guage he de­signed for spec­i­fy­ing graphs. I was im­me­di­ately taken by the idea be­cause this was a much bet­ter way to make graphs than the clumsy touch­pad.

Well, Brian’s Pic is an­ti­quated now, but Graphviz does the same kind of thing. Graphviz uses the DOT graph spec­i­fi­ca­tion lan­guage. DOT is ex­tremely sim­ple. It lit­er­ally takes 5 min­utes to be able to make nearly any graph you want. For ex­am­ple, con­sider the DOT code be­low

digraph any_other_name{

It pro­duces the Graph:

If you’re in­ter­ested, you should try some on­line Graphviz soft­ware to get the hang of it. I used this one but there are a lot of choices one web search away.

I can only imag­ine how many hours this thing could have saved me the pre­vi­ous se­mes­ter when I was stuck mak­ing elab­o­rate graphs for my mi­nor pro­ject re­port and pre­sen­ta­tion.

Generating the DOT

Generating the DOT for Graphviz is gen­er­ally triv­ial. For linked list, a re­cur­sive func­tion such as the one be­low suf­fices (for c++):

void printElement(listElement* el, int nodeNumber){
  string DOT;
  DOT = "n" + to_string(nodeNumber) + " [label=";
  DOT+= el->name+"\"]\n";

  if (el->right){
    DOT += "n" + to_string(nodeNumber) + "->";
    DOT += "n" + to_string(nodeNumber+1) + "\n";
    printElement(el->right, nodeNumber+1);
  cout << DOT;

It is a lit­tle more com­plex for the AST. Because trees are re­cur­sive struc­tures, it’s easy to write el­e­gant re­cur­sive func­tions that process the trees. In this case, the genDOT(astNode,bool) func­tion helps re­cur­sively de­scend the AST tree, and print out it’s nodes.

int rootNum = 0;
void Parser::genDOT() {
  cout << "digraph G{\n";
  cout << "graph[fontname=Rajdhani, color="#242038"]\n";
  cout << "node[fontname=Rajdhani, color="#242038", shape=square]\n";
  cout << "edge[fontname=Rajdhani, color="#242038"]\n";
  genDOT(root, false);
  cout << "}\n";

void Parser::genDOT(astNode &n, bool isLeft) {
  std::string print;

  // DOT for current node
  print = "n" + to_string(rootNum) + " [label=< <B>" +   translateTypeStr(n->type) + "</B>";
  if (translateDataStr(n) != "")
    print += "<BR/>" + translateDataStr(n) + " >";
    print += " >";
  if (isLeft)
    print += ", color="#FF595E", fontcolor="#FF595E"";
    print += ", color="#274690", fontcolor="#274690"";

  if (n->type == ast::astType::numericConstant ||
      n->type == ast::astType::stringConstant ||
      n->type == ast::astType::boolConstant) {
    print += ", shape=egg";
  print += "]\n";
  int rootNumBackup = rootNum;
  if (n->type == ast::astType::_if) {
    if (n->cond) {
      print += "n" + to_string(rootNumBackup) + "->n" +
        to_string(rootNum) + " [label=condition]\n";
    rootNum += 1;
    if (n->_if) {
      print += "n" + to_string(rootNumBackup) + "->n" +
        to_string(rootNum) + " [label=if]\n";
    rootNum += 1;
    if (n->_elseIf) {
      print += "n" + to_string(rootNumBackup) + "->n" +
        to_string(rootNum) + " [label=elseif]\n";
    rootNum += 1;
    if (n->_else) {
      print += "n" + to_string(rootNumBackup) + "->n" +
        to_string(rootNum) + " [label=else]\n";
    rootNum += 1;
  } else if (n->type == ast::astType::_while) {
    if (n->cond) {
      print += "n" + to_string(rootNumBackup) + "->n" +
        to_string(rootNum) + " [label=condition]\n";
    rootNum += 1;
    if (n->_if) {
      print += "n" + to_string(rootNumBackup) + "->n" +
        to_string(rootNum) + " [label=do]\n";
    rootNum += 1;
  } else {
    if (n->left || n->right) {
      if (n->left) {
        print += "n" + to_string(rootNumBackup) + "->n" +
          to_string(rootNum) + " [color="#FF595E"]\n";
      } else {
        print += "n" + to_string(rootNum) + " [label=x]\n";
        print += "n" + to_string(rootNumBackup) + "->n" +
          to_string(rootNum) + "\n";
      rootNum += 1;
      if (n->right) {
        print += "n" + to_string(rootNumBackup) + "->n" +
          to_string(rootNum) + " [color="#274690"]\n";
        genDOT(n->right, false);
      } else {
        print += "n" + to_string(rootNum) + " [label=x]\n";
        print += "n" + to_string(rootNumBackup) + "->n" +
          to_string(rootNum) + "\n";
  cout << print << endl;

Of course, this is not the com­plete code. It only cov­ers if and while loops, but not func­tion bod­ies and other blocks. But this is the essence. Funnily enough, I spent less time writ­ing code to ac­tu­ally gen­er­ate the graph than I spent try­ing to make it look cooler. But given how much time I’m go­ing to have to stare at and nav­i­gate these graphs, I think the it was worth the ef­fort.

The rootNum keeps track of the par­ent node at each re­cur­sion. The par­ent node needs to point to all it’s the chil­dren, and rootNum helps keep track of the that. The genDOT() func­tion is sim­ply the pub­lic in­ter­face which first prints out the nec­es­sary DOT boil­er­plate and then calls the main genDOT(astNode,bool) function. The genDOT(astnode,bool) func­tion re­cur­sively prints out all the DOT code. Here’s the kind of DOT code it pro­duces:

digraph G{
    graph[fontname=Rajdhani, color="#242038"]
    node[fontname=Rajdhani, color="#242038", shape=square]
    edge[fontname=Rajdhani, color="#242038"]
    n3 [label=< <B>Declare Var</B><BR/>नाम >, color="#FF595E", fontcolor="#FF595E"]
    n2 [label=< <B>chain</B> >, color="#FF595E", fontcolor="#FF595E"]
    n2->n3 [color="#FF595E"]
    n5 [label=x]
    n1 [label=< <B>Declare String </B> >, color="#FF595E", fontcolor="#FF595E"]
    n1->n2 [color="#FF595E"]
    n6 [label=x]
    n10 [label=< <B>dataEl</B><BR/>नाम >, color="#FF595E", fontcolor="#FF595E"]
    n12 [label=< <B>string</B><BR/>नीरब >, color="#274690", fontcolor="#274690", shape=egg]
    n9 [label=< <B>=</B> >, color="#FF595E", fontcolor="#FF595E"]
    n9->n10 [color="#FF595E"]
    n9->n12 [color="#274690"]
    n17 [label=< <B>string</B><BR/>नमस्ते मालिक >, color="#FF595E", fontcolor="#FF595E", shape=egg]
    n16 [label=< <B>chain</B> >, color="#FF595E", fontcolor="#FF595E"]
    n16->n17 [color="#FF595E"]
    n19 [label=x]
    n15 [label=< <B>FunctionCall</B><BR/>लेख >, color="#FF595E", fontcolor="#FF595E"]
    n15->n16 [color="#FF595E"]
    n20 [label=x]
    n14 [label=< <B>Statement<BR/>3</B> >, color="#FF595E", fontcolor="#FF595E"]
    n14->n15 [color="#FF595E"]
    n21 [label=x]
    n8 [label=< <B>If</B> >, color="#FF595E", fontcolor="#FF595E"]
    n8->n9 [label=condition]
    n8->n14 [label=if]
    n7 [label=< <B>Statement<BR/>2</B> >, color="#274690", fontcolor="#274690"]
    n7->n8 [color="#FF595E"]
    n25 [label=x]
    n0 [label=< <B>Statement<BR/>1</B> >, color="#274690", fontcolor="#274690"]
    n0->n1 [color="#FF595E"]
    n0->n7 [color="#274690"]

After the code was work­ing, I wrote a Makefile that saves the out­put, gen­er­ates a PNG im­age of the graph and opens that im­age in the feh im­age viewer.

test_ast: parser/parser.C parser/parser.h tests/parser/parser.C lexer/lexer.C lexer/token.C lexer/characters.C
    $(CC) $(CFLAGS) parser/parser.C parser/ast.C parser/helper.C tests/parser/parser.C lexer/characters.C lexer/token.C lexer/lexer.C file_handler/file_handler.C -o tests/parser/parser.out
    @echo "Starting Parser Test"
    @tests/parser/parser.out > .graph
    dot -Tpng < .graph > .png
    feh .png

This make­file sim­ply redi­rects the DOT out­put from STDOUT to a file, and uses the graphviz pro­gram dot to gen­er­ate a im­age that can be opened with feh.


After the set up, the whole thing just fit into my work­flow.

Here’s an ex­am­ple graph gen­er­ated from the make­file.

So the next time you’re try­ing to de­bug or in­spect a com­plex data struc­ture or a tan­gle of point­ers, try tak­ing some time out to write code which gen­er­ates DOT so that you can in­spect the graph. I tell you, a pic­ture is worth a thou­sand words. The AST be­low is worth much more: