A Simple Interpreter in C++

A little while ago I made my first post here and put up a link to a simple interactive shell I had made. Now, If you went out of your way to play with the shell it might not take very long for you to find some issues with it. If you are bold enough to actually open up the source code for it, you might find the overall implementation to be fairly poor. Maybe it isn’t a good idea for me to post code that is not as good as it can be since it could reflect poorly on me as a programmer. However, to me, the goal of building things like this is to learn.  I am perfectly fine with the idea that the code I write is not perfect and I don’t think that should say anything bad about me as a programmer up front. In general, I would hope the ambition to continuously learn and improve with everything I create outweighs the fact that these creations may not end up being brilliant or flawless pieces of software.

So what was the problem with the interactive shell?
Well, for starters, there was no solid organization for how I tried to parse or interpret the commands entered by the user.
What do you mean by that?
At the time of starting the project, I had absolutely no experience or knowledge of compiler theory or the general organization of a compiler/interpreter. In other words: I didn’t know that a compiler is usually broken up into a lexer and parser. As a result, no formal tokenization really ever takes place, which doesn’t make the process of parsing and evaluating expressions any easier.

I don’t want to spend too much time talking about what I didn’t do correctly, I just want to mention some points which inspired me to take another stab at a similar problem in a more intelligent way. As I mentioned, the interactive shell was primarily motivated by my desire to build something in Javascript and it was just something to keep me coding at the time.  It was not about trying to learn intelligent methods of parsing. Recently however, I decided to try to build a basic interpreter in C++ and I wanted to go about it in a more organized fashion.

All I wanted was the ability to, once again, evaluate arithmetic expressions and to allow the creation and use of variables in these expressions. Instead of being an interactive shell, it would simply read in a full script from a text file and evaluate the expressions line by line and print any results out to the console. I admit, this is not terribly useful, but I wanted to try and do things more intelligently.

What do I mean by ‘more intelligently’? I mean that I wanted to have a formal lexer which performed tokenization first, and then these tokens would be used by the parser to evaluate the expressions based off of a well defined grammer. So what all is needed?

First, I defined the valid set of tokens which are allowed:

NUMBER (an integer number. no decimal places)

IDENTIFIER (basically a variable name. can start with upper or lower case letter. can contain letters, digits, and underscores)

LINECOMMENTS(the usual ‘//’ for line comments is allowed. these arent actually technically a token since they are ignored by the lexer)

PLUS(+), MINUS(-), DIVIDE (/), MULTIPLY (*), EQUALS (=), SEMI(;), LPAREN( ‘(’ ), and RPAREN ( ‘)’ ).

Given these tokens, I define a very basic grammer for the different types of statements which are valid. Basically we only have 2 types of statements: a straight forward expression to be evaluated, or an assignment statement where the value of an expression is placed into a variable.

An assignment statement is defined as: IDENTIFIER EQUALS expression SEMI

where expression is defined as one of: NUMBER or IDENTIFIER or LPAREN expression op expression RPAREN SEMI

and op is defined as: PLUS or MINUS or DIVIDE or MULTIPLY

Furthermore, the value of a straightforward expression is printed to the console while nothing is printed in an assignment statement. So, an example script could end up looking like:

//This is just a simple script…

(3+4);
(12 - (3*7));
((10-5) * (3+5));
(((5-2) + (3*2)) + 10);
A = (3+4);
A = (A + 4);
B = A;
B;

The syntax is far from elegant, but that’s not really the point. The parsing is my attempt at something resembling recursive descent parsing, which you can read a bit more about here.  My actual code is heavily influenced in style from that in the article. However, I did do some things a bit differently, especially because I also wanted to actually evaluate the expressions as I was parsing them.

It’s still nothing amazing, but I feel like my code is more easily understood and organized for now. speaking of the code, you can get it here. I wrote it in Microsoft Visual C++ Express Edition and I simply zipped up the Visual Studio project to post. Take a look at it if you’re interested and feel free to share any comments or critisism. Its getting late, and I’m off to bed. Goodnight….

Leave a Comment

You must be logged in to post a comment.