ANTLR4 and Prolog Code Comprehension (AI With Esoteric Tools!)

Introduction

Spaghetti code can be unnerving.  It can be very difficult to determine where a bug occurs because variable assignments can be spread over hundreds or thousands of lines of code.  In this article, I’ll:

  • Present a small program that is tending toward spaghetti code
  • Show how to use ANTLR4 to emit Prolog code to unravel it
  • Show how to use Prolog to find variable dependencies

First, let’s look at a small program.

p = 32.0
i = j - ( k * 36.0 )
n = 0.034 * i
h = 0.05 * i
m = n + p
e = f * ( g / h )
c = d + e
a = b + c

Now, let’s say you have this program and are debugging it to find out why the value of a is wrong. In only eight lines, we have a mess, and a depends on more variables than will fit into our short-term memory. So I challenge you to determine all the variables that affect a without at least a pencil and paper. Done? Okay, that wasn’t that hard, but what if the code were 800 or 8000 or 80,000 lines long? That would be a very difficult task, wouldn’t it?

The Solution

What we need is something to learn all dependencies of a, and then to be able to ask it questions.

The Grammar

ANTLR4 is a tool for language recognition.  It is employed to create domain-specific languages and interpreters for existing ones.   In our example we’re going to use a very simple grammar that is just variable assignments and simple arithmetic expressions.

grammar Arithmetic;
program : statement+ EOF;
statement : assignment_statement # Assignment
; 
assignment_statement : LHS = VARIABLE OP = ASSIGNMENT RHS = expr # AssignmentStatement
;
expr
    :   '('   Exp = expr ')'                            # Parens
    |   MINUS Exp = expr                                # UnaryMinus
    |   LHS = expr Op = (TIMES | DIV)  RHS = expr    # MulDiv
    |   LHS = expr Op = (PLUS  | MINUS) RHS = expr   # AddSub
    |   (VARIABLE | CONSTANT)                           # Element
    ;
ASSIGNMENT  :   '=' ;
PLUS        :   '+' ;
MINUS       :   '-' ;
TIMES       :   '*' ;
DIV         :   '/' ;
LPAREN      :   '(' ;
RPAREN      :   ')' ;
VARIABLE    :   LETTER+(LETTER|DIGIT|'_')* ;
CONSTANT    :   NUMBER ;
NUMBER      :   DIGIT+'.'DIGIT+ ;
LETTER      :   ('a' .. 'z') ;
DIGIT       :   ('0' .. '9')    ;
WS          :   [ \r\n\t] + -> skip ;

This is an ANTLR4 grammar file with parser and lexer rules combined into one file. It it used by the ANTLR engine of our choosing (we’ll be using the C# engine) to implement language features.

Instantiating ANTLR

I chose to make this example as simple as possible, and so I built a Windows console application. Here is my program:

internal class Program
{
  private static void Main()
  {
    string programCode = File.ReadAllText(@"D:\dev\prolog\PrologAntlr2\test_001.txt");
    using (TextReader stream = new StringReader(programCode))
    {
      AntlrInputStream inputStream = new AntlrInputStream(stream);
      ArithmeticLexer lexer = new ArithmeticLexer(inputStream);
      CommonTokenStream tokenStream = new CommonTokenStream(lexer);
      ArithmeticParser parser = new ArithmeticParser(tokenStream);
      ArithmeticParser.ProgramContext context = parser.program();
      ProgramListener listener = new ProgramListener();
      ParseTreeWalker walker = new ParseTreeWalker();
      bool done = parser.BuildParseTree;
      if (done)
        walker.Walk(listener, context);
      else
        Console.WriteLine("ERROR: parser.BuildParseTree failed.");
      Console.ReadKey();
    }
  }
}

This is a fairly standard way of setting up an ANTLR listener pattern. The end result is that the tree walker will start walking the supplied source code at the start point called program. It will lexically analyze and parse the code and generate events that can be overridden to respond to things that happen during the source code tree walk. There are many good tutorials on the internet about this wonderful toolkit.

The Listener

In ANTLR, the we inherit from a base class generated by the tool to implement the Listener. We override certain events that are interesting to us to achieve the effect we want. Here is the C# Listener class:

using System;
using System.Collections.Generic;
using System.IO;
using Antlr4.Runtime.Tree;

namespace PrologAntlr2
{
  public static class Environment
  {
    public static string LHS = "";
    public static List RHSvars = new List();
    public static void Clear()
    {
      LHS = "";
      RHSvars.Clear();
    }
  }

  public class ProgramListener : ArithmeticBaseListener
  {
    private const string PrologFilename = @"D:\dev\prolog\dependencies.pl";

    public override void EnterProgram(ArithmeticParser.ProgramContext context)
    {
      // Delete the generated prolog file each time we run this program.
      File.Delete(PrologFilename);
    }

    public override void EnterAssignmentStatement(ArithmeticParser.AssignmentStatementContext context)
    {
      // Clear the environment for every assignment statement, then
      // remember the variable on the left hand side of the assignment.
      Environment.Clear();
      Environment.LHS = context.LHS.Text;
    }

    public override void EnterElement(ArithmeticParser.ElementContext context)
    {
      // Collect right hand side variables for prolog facts.
      ITerminalNode itn = context.VARIABLE();
      if (itn != null)
        Environment.RHSvars.Add(context.GetText());
    }

    public override void ExitAssignmentStatement(ArithmeticParser.AssignmentStatementContext context)
    {
      // Write each prolog fact.
      foreach (string s in Environment.RHSvars)
      {
        File.AppendAllText(PrologFilename, "dep(" + Environment.LHS + ", " + s + ")." + System.Environment.NewLine);
      }
    }

    public override void ExitProgram(ArithmeticParser.ProgramContext context)
    {
      // Write prolog variable dependency rules to the end of the file.
      const string prologRules = @"
depends(X, Y) :- dep(X, Y).
depends(X, Y) :- dep(X, Z), depends(Z, Y).";
      File.AppendAllText(PrologFilename, prologRules);
      Console.WriteLine("DONE");
    }
  }
}

The comments explain the code well, but if you’re new to ANTLR and Prolog you’ll just have to trust me here. The main goal of the Listener above is to emit syntactically correct Prolog code that we can use to interrogate about variable dependencies.

The Prolog Output

When we run the console application, here’s what the Listener emits:

dep(i, j).
dep(i, k).
dep(n, i).
dep(h, i).
dep(m, n).
dep(m, p).
dep(e, f).
dep(e, g).
dep(e, h).
dep(c, d).
dep(c, e).
dep(a, b).
dep(a, c).

depends(X, Y) :- dep(X, Y).
depends(X, Y) :- dep(X, Z),	depends(Z, Y).

This is a fully functional Prolog program, with facts about dependencies collected by the Listener, and rules about how to act on those facts. These two rules called depends in our case say that:

    • X is dependent on Y if there is a fact above that says so
    • X is dependent on Z is there exists some intermediate variable Y on which Y depends, recursively

This recursive rule harnesses the power of Prolog to backtrack with its inference engine and find all solutions that will satisfy the question, namely, on what variables does X depend? So the dep() facts in the Prolog program above represent — and can be hand-checked — direct dependencies between variables in one assignment statement, and the recursive dependency rules will allow us to query Prolog to see what further, indirect dependencies exist.

The Prolog Side

I’m using SWI Prolog 7.6.4 here because it’s free and capable. Let’s use Prolog now to see some dependencies. The first thing we do is tell the Prolog environment to consult a code file:

?- consult('D:\\dev\\prolog\\dependencies.pl').

This loads the file we just generated into the Prolog environment and allows us to work with it.

Querying Prolog

Let’s start with an easy one. Let’s see what variables affect the value of i:

?- findall(X, depends(i, X), List), print(List), nl.

This means, “find all solutions X such that X depends on i and put it in the list called List, then print the list and a newline.” If we run it we get:

[j,k]
List = [j, k].

And so as it’s pretty clear from line 2 of our program, yes, i depends on the value of j and k. What if we were to ask about m?

?- findall(X, depends(m, X), List), print(List), nl.
[n,p,i,j,k]
List = [n, p, i, j, k].

Clearly from line 5, m depends on n and p, but n also depends on i, which has other dependencies as we just saw. So the list includes all five dependencies. Now, for the hard one we started with — what affects the value of a?

?- findall(X, depends(a, X), List), print(List), nl.
[b,c,d,e,f,g,h,i,j,k]
List = [b, c, d, e, f, g, h, i, j|...].

So a depends on ten different variables for its value, in this tiny example.

Conclusion

While ANTLR can be used to create interpreters and domain-specific languages, its parsing abilities have other creative uses, too. In this case we are literally transforming code from our simple grammar into a Prolog program that represents the dependencies among variables. Imagine how powerful something like this could be with a non-trivial grammar and spaghetti code that is many thousands of lines, when it comes debugging time, or refactoring time! I hope you enjoyed this foray into two rather esoteric uses of non-mainstream languages and tools to see what can be done with a little creativity. Thanks for reading.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s