ANTLR 4: Making a compiler with the JavaScript runtime
Demonstration of the implementation of a programming language using ANTLR 4. Source codes included.
By choosing the JavaScript language for the compiler, we have a portable tool, working with Node on any system. The demonstration explains how to use ANTLR 4 to generate the compiler from the "calc" language grammar, which performs some arithmetic operations.
- The calc language has rules for assigning an expression to a variable and displaying a result.
- The compiler loads a .calc file containing a program.
- It saves the target code in a file with a .js extension.
- The program written in calc is compiled in JavaScript in the demo. You can change the implementation for a different target language, such as wasm, C or a bytecode.
Installing ANTLR 4 and the JavaScript Runtime
What you need:
- ANTLR4, the lexer and parser generation tool. Download antlr-4.7-complete.jar on antlr.org in the "Development tools" section.
- Node.js to run the command line code. The package includes npm.
- The JavaScript runtime. You can install it more easily with the npm install antlr4 command.
There is nothing else to install, this is the advantage of the JavaScript version. Users of the language need only Node and the compiler.
Defining a grammar
With ANTLR4 there are, according to the authors, no limitations in the complexity of the grammar that can be defined.
But for the calc language we will start with a simple grammar ...
grammar calc;
program:
(
print
| assign
| emptyline
)*
;
assign:
VARIABLE (EQCOL | EQ) expression
EOL
;
print:
PRINT expression EOL
;
condition:
expression relop expression
;
expression:
multiplyingExpression ((PLUS | MINUS) multiplyingExpression)*
;
...
EQCOL:
':='
;
EOL:
[\r\n]+
;
WS:
[ \t] + -> skip
;
The first grammar rule is program that references assign and print.
The generator produces functions corresponding to each rule in the grammar, so to start the parsing the compiler will call the program() function, then it will use ParseTreeWalker to activate the listener.
Generating the compiler
From the grammar, compiler generation is done with a java command and the "JavaScript" option. Example for the calc language:
java org.antlr.v4.Tool -Dlanguage=JavaScript calc.g4
To simplify your life, you can create a batch file containing the command like the file ant.bat, for Windows, in the demo.
This command creates the following files:
- calcLexer.js
- calcParser.js
- carlListener.js
- as well as calc.tokens and calcLexer.tokens.
The main part of the compiler holds in a few lines ...
const antlr4 = require("antlr4/index")
const fs = require("fs")
const calcLexer = require("./calcLexer.js")
const calcParser = require("./calcParser.js")
const JSListener = require("./JSListener.js").JSListener
const iName = process.argv[2]
JSListener.tFileName = iName.replace(".calc", ".js")
console.log("Compiling " + iName + " to " + JSListener.tFileName)
var input = fs.readFileSync(iName, 'UTF-8')
var chars = new antlr4.InputStream(input)
var lexer = new calcLexer.calcLexer(chars)
var tokens = new antlr4.CommonTokenStream(lexer)
var parser = new calcParser.calcParser(tokens)
parser.buildParseTrees = true
var tree = parser.program()
var extractor = new JSListener()
antlr4.tree.ParseTreeWalker.DEFAULT.walk(extractor, tree)
We read a file containing a source code with readFileSync. The content is passed to the lexer, which in turn provides a list of tokens to the parser. The parser creates a tree that is traversed by ParseTreeWalker.
Developing a listener
Antlr automatically carries out itself a listener, calcListener, which it saves in the calcListener.js file. It contains an enter and exit function for each rule in the grammar. It remains to associate instructions for generating the target code for each of these rules, in the functions of the listener.
But we will not place them in the calcListener file. Indeed each time the language is modified, and the generation command is launched, this file is replaced automatically and any added content would be deleted. That is why we do create another file with identical functions, JSListener.js.
var antlr4 = require('antlr4/index');
const calcListener = require('./calcListener.js').calcListener
const fs = require("fs")
const print = console.log
// include directly the implementation of the compiler
eval(fs.readFileSync("implement.js", "UTF-8"))
JSListener = function () {
calcListener.call(this);
return this;
}
JSListener.prototype = Object.create(calcListener.prototype);
JSListener.prototype.constructor = JSListener;
JSListener.tFileName = "test"
JSListener.prototype.enterProgram = function(ctx) {
// create the target file
openTarget()
};
JSListener.prototype.exitProgram = function(ctx) {
// fill the target file and close it
closeTarget()
};
JSListener.prototype.enterAssign = function(ctx) {
};
JSListener.prototype.exitAssign = function(ctx) {
// get the variable
var t1 = ctx.getChild(0).getText()
// skip the := symbol to use = instead
// get the expression
var t2 = ctx.getChild(2).getText()
write(t1 + "=" + t2)
};
JSListener.prototype.enterPrint = function(ctx) {
};
JSListener.prototype.exitPrint = function(ctx) {
var temp = "console.log("
// I skip the 'print' keyword so go to second child
temp += ctx.getChild(1).getText()
temp +=")"
write(temp)
};
The ctx argument of the exitPrint function, for example, of the print grammar rule, contains a tree of nodes representing the elements of this rule in the grammar. For the print rule there are three child nodes:
- The keyword PRINT.
- The expression.
- The end of line symbol EOL.
The first node is ignored since the target language uses console.log, the second is directly transmitted with the command ctx.getChild(1).getText (), and the last one is ignored.
Implementing the compiler in JavaScript
In addition to the instructions in the listener that are used to retrieve the source code to convert (or reuse) it into the object code, the compiler requires other functions that are placed in a separate file, implement.js.
This file is directly included in the listener with the eval command, to be simple. In the production version we would rather build a module.
var tContent = []
var tFile = undefined;
var openTarget = function() {
try {
tFile = fs.openSync(JSListener.tFileName, "w")
}
catch(err) {
console.log("Target file not created. " + err.message)
return;
}
}
var closeTarget = function() {
if(tFile==undefined) return;
for(var line in tContent) {
fs.writeSync(tFile, tContent[line].trim() + "\n")
}
}
var write = function(data) {
tContent.push(data)
}
It contains three functions:
- Write that places each line of the target code in an array.
- OpenTarget, which creates a target file and is called at the beginning of parsing.
- CloseTarget, which saves the array in the target file and is called at the end of parsing.
A final compiler would contain many other functions, such as type checking and error handling.
Running a first program
A small demonstration program, test.calc, is included in the archive:
x := 10 + 5
print x + 8
Each statement ends with an end of line. Including the last, the end of file is not taken into account in this demo.
The compiler produces a JavaScript version in the calc.js file. Try it on the sample file test.calc:
node calc.js test.calc
This generates the test.js file. To run it, type:
node test.js
This should display: 23.
The compiler and all the files necessary for the demonstration are available in an archive to download: