A Programmer's Progress: Developing The Bkeeping Language

Throughout my Software career, I've always tried to build my skill set on top of each other. Learning and becoming proficient with the visitor pattern and recursive design in general leapt my coding skills to a much higher level than they had previously been. Learning to write my own little syntax for a custom task was the next level.

My quest began about 2 or more years ago, working in a small development shop, with a lot of functionality to implement. I was coding a lot and the ways in which we were giving users and library users in which to control our system seemed heavy. "I wish I could just express to the computer what I want to do and have it perform the functionality", I blurted. About a year later(after finishing that contract), I stopped by the company and started talking about programming in general and the expressive power that various patterns offer. Using OO to express, define, and execute complex behaviour just didn't seem enough. "You really need something simpler that let's you do all that... (figuring pause) That's a language!!!", I blurted. The revelation just fell out of me and I suddenly understood what the Interpreter pattern was for.

Fast forward to Bookkeeping and I’d come across the need for the program to read user input to control the system. Since there are a number of commands a user can make, parsing and validating that input started becoming a daunting task. This was the perfect opportunity to develop a grammar that handles that input.

WHEN TO USE A LANGUAGE

A domain specific language (DSL) is a computer language that's targeted to a particular kind of problem domain or task. This is opposed to a general purpose language that's aimed at any kind of software problem. For example, ‘bookkeeping’ is required to read user input to control the system. The problem domain in ‘bookkeeping’ was executing double entry bookkeeping functions on hierarchical journal data. I needed those functions to be recursive, and deal with evolving data as the user deals with those journal entries and account data.

If code compactness, reuse and overall maintainability are major concerns, then developing a DSL lets you solve problems by writing sentences in that DSL, rather than recoding your algorithm every time you need it. They enable expression and validation of concepts at the level of abstraction of the problem domain.

CONSTRAINTS & GOALS

I wanted a way to work with double-entry bookkeeping data that was in an XML format. The logical structure of the data necessitated me working with a hierarchical structure where debits exist inside accounts, and entries exists inside of a journal, and so on. I wanted a way for the language to deal with rudimentary bookkeeping functions like adding, finding and reversing entries.

So the design constraints of bkeeping were to deal with logical bookkeeping data structure and the raw XML structure in which that data was encoded. The design goals were to be able to execute bookkeeping functions on this data structure. For compactness, I also wanted to be able to use the results of a function directly as the input to another function. So the language could look like fig.1.

func1 (
func2 ( )
);

fig.1

With the data format as XML, I wanted a way to ferry around that XML data from function to function. So XML can be used as input to a function ( tokens with options are also used ) , and output from a function ( list of XMLs can also be produced ).

func1 (
func2 (
funcIn3 ( ),
funcIn4( token -opt1 123 -opt2 abc )
)
);

fig.2

DESIGN CHOICES & RATIONALE

i) variables
I had already started to develop the shell environment for bkeeping (bkell) when I began considering variables. XML and lists of XML is the data that bkeeping uses. Variables were a language artefact that I realised I needed, as a container of return values or simply data that I wanted to use later in my script.

ii) explicit scoping
As a programmer and program user, I prefer the readability of explicit scoping in my programs, even if it leads to a bit more verbosity. The fact that the ‘if’ statement in Java doesn’t need brackets if the code block is on one line is very annoying for me from a readability point of view. So function blocks, input parameters, and context input are all surrounded with brackets to clearly indicate the section, scope and intention of a particular block of code.

iii) capturing context input for a function
Often, the input data of a function must operate on a given piece of data. This context data on which the function operates can come from anywhere in the program data, and I wanted a specifi way to identify on which context you would like to execute a command. For example, you can add a debit or credit to any journal entry in the system. Thus the add command provides a block to specify the context, or journal entry in which to add a debit.
ex:
add ((token.literal) token.literal,...)

iv) recursive functions
I wanted the ability to include the output of a function as the input to another function. Additionally, I like the readability and decomposability that list offers, in that a sub-block of code can be read and executed as a self-contained code block itself. For these reasons, I wanted to be able to make functions recursively nesting.

v) comments
I followed the Java example for single and multi line comments.

Following all these design decisions, bkeeping code looks like the code in fig.3. An interesting side effect of these feature ( especially explicit scoping and recursive functions ) is that bkeeping code looks a lot like lisp. I remember some weeks after having released Bookkeeping, reading some lisp code and completely understanding what it did ( and I don’t know lisp yet ).

// single comment line describing variable
var1 = func0();

/**
* comment block for multiple lines
* describing function block
*/
func1 (
func2 (
funcIn3 ( ),
funcIn4( token -opt1 123 -opt2 abc ),
@var1
)
);

fig.3

FUTURE DIRECTION

i) parsing at expression terminal
Currently, the parser generator SableCC requires that the user enter and EOF character to begin parsing a phrase. This is reduces the ease of use and universality of bkeeping input scripts. A user must figure out how to produce an EOF character in their given environment, in their given platform. This should be resolved in a future version of SableCC, allowing the bkell shell ( and thus the user ) to begin parsing at the end of an expression.

ii) importing bkeeping files
As a side effect of the inability to parse bkeeping phrases at expression terminal, we cannot import supporting or library bkeeping files. This feature will allow bkeeping developers better organise and separate sections of code into libraries that can be reused and better maintained.

iii) function definitions
Currently, there is a finite amount of functions in the system. If a bkeeping developer should have the need to write their own specific functions, this will give them that ability. This feature will also improve the ability to create custom libraries and functions that can be reused.

iv) control loop
A mechanism is needed to iterate over lists of XML.

v) using XPath expressions for searching
XPath searching is a future feature of ‘Bob’, the underlying architectural framework that bookkeeping uses. This feature will allow a bkeeping user to more specifically and expressively search for a piece of information in the bookkeeping data structure.

CHOOSING A COMPILER COMPILER TOOL

There were several options from which to choose a compiler compiler (or parser generator) tool. I evaluated these tools in the following sequence, learning from the mistakes or shortcomings of the previous.

A. Interpreter Pattern
The Interpreter Pattern was something I'd wanted to try out (in real software) since my revelation. It is a good pattern when the input has already been lexically parsed or the command set is very simple. If the syntax or input is complex ( using recursion and raw XML ), the task of parsing the input and storing it in a lexical tree is difficult as parsing and executing the language itself. With this point, I understood the functional complexity and differences between lex and yacc. The syntax I designed for Bookkeeping allows users to recursively execute commands and return inputs for another command. When I tried parsing user input for 'Bookkeeping', I came across 2 main problems.
i) This syntax is not recursive. Thus one cannot put commands into commands
ii) Extracting raw XML from command line is very difficult

B. JFlex / Byacc/J
Flex / Lex - a Lexical Analyser Generator, or the part of a compiler process that tries to make sense of the input
Bison / Yacc - Generates a parser based on an analytic grammar in BNF

Originally, my main problem with using this combination was that I had to tie the development environment to the platform that JFlex/Byaccj was first used on (a Mac in this case). But I began to realise that there were better parser generation techniques for java projects.

Using JFlex's input lexical file is not hard so much as the format is not what I'm used to in the Java world. I started with the simplest possible lexical configuration file then grew my definitions, rules and actions incrementally. I started to understand how to interface the tokens into something that Byacc/J can understand. But by that time, I was already looking at some all java parser generator solutions.

C Javacc
Javacc looked robust. Also, it was a Sun project. However, I found little documentation and the documentation I did find did not describe how to use the tool.

D. Antlr
I tried using this package because it is well documented and very popular in the java development community. The difficulties in learning were the following:
i) calls to import external grammar definitions
ii) handling lexical nondeterminism ( the lexer cannot differentiate between part of the same pattern ex: "-entry" vs. "-entrynum"). This is a condition where the lexer could possibly confuse the definitions between different rules with part of the same pattern. So "entry" vs. "-entrynum" looks the same when it is only looking at 1 or 2 characters at a time. I solved this by setting the lookahead depth 'k', to a level that distinguashes '-entry' and '-entrynum'. So whereas k=2, the lexer would only see '-e' and '-e' respectively. Where k=7, the parser sees '-entry' and '-entryn' as distinct. Set the Lexer's 'k' option like below.
iii) defining recursive calls in my grammar

E. Sablecc
I chose SableCC as it was very simple and elegant, yet powerful enough to handle the parser generator needs in bkeeping.

FEEDBACK ON BKEEPING

I’ve gotten some good feedback on Bookkeeping and the DSL it uses, BKeeping. Emmanuel Pirsch is a Senior Consultant at CGI and had some insights into the overall design of Bookkeeping and bkeeping. I’ve included both his emailed feedback (at the bottom) and my response (at the top) to give a clear idea of his message.

Hey Emmanuel, thanks for your feedback. And I like your comments.   I haven't worked with YAML yet. But from what I can see of a simple YAML document, it seems strange to organise structure based on spaces ( tabs not included ). It may just be preference, but I like explicit scoping, which is what something like JSON(http://json.org/) provides. Plus it's a bit more flexible and less verbose than XML. James Clark has some interesting things to say about it (http://blog.jclark.com/2007/04/do-we-need-new-kind-of-schema-language.html).   I agree with your reservations on customising the generated code. What I typically do is make subtypes of the generated classes or interfaces. I'm also toying with a format for how to customise generated code in a CDATA block. I didn't want to make too many deep architectural decisions before getting the tool out to the public and having some feedback.   As for the 'BKeeping' command language, it is actually a DSL. My next step is to i) implement function definitions, ii) loops, and iii) a file include mechanism (but that'll have to wait for SableCC 4). I see where you're going with the more naturally syntaxed DSL below. But again, I prefer (as a user) explicit scoping in my languages. From a user's point of view, I would expect a command with options to follow the same format, so fig.1 and fig.2 are weird for me as I'm creating a business object in both cases, but how and when do I know to put options? Remember that in 'BKeeping', the commands in fig.3 all produce the same results (try them out).   create asset account Cash fig.1   create journal QuebecExpenses fig.2    var j1 = create ( journal -name QuebecExpenses ); var j1 = create ( );  var j1 = create( create ( journal -name QuebecExpenses )); fig.3    As far as variables go, the commands in fig.4 assume that Cash, Bus and QuebecExpenses were all previously stored to memory. Again I prefer explicit. In a big file or when executing a lot of commands, you (the user) could loose track of your variable names. One thing I like that you did was that debit / credit accounts and post to a journal in 1 command. The commands in figures 1, 2 and 4 could be combined into 1 command that looks like fig.5, but as you can see, it's a bit more verbose (corresponding accounts are automatically touched).   register debit_from Cash credit_to Bus of 50$ in QuebecExpense (no dates, so it must be today) register debit_from Cash credit_to Bus of 50$ in QuebecExpense on 01/10/2007 fig.4    add ( (@j1 )   add ( (create (entries))  add ( (create (entry))  create ( debit -amount 50.00 -account cash )   create ( credit -amount 50.00 -account bus )   )  ) ); fig.5    Tim--— Original Message -— From: Emmanuel Pirsch  To: Timothy Washington  Sent: Friday, April 13, 2007 9:19:26 AM Subject: Re: Bob and Bookkeeping  Hi Timothy,  This is nice work... Unfortunately, I tend to distance myself from XML and especially from code generation tool.  While XML is a great way to express data structure and parsers are readily available, nowadays I prefer to use Yaml for datastructures and DSL (Domain Specific Language). XML is still important as it's now a big part of how stuff is exchanged on the Internet but it's to verbose to my taste.  As for code generation tool, they create a maintenance hell (you can never really customize the generated code in a way that your modification will not be overwritten when you regenerate it) and the problems they aim to solve are better solved with better design and frameworks. Code generation may be if you never need to customize the generated code, but most of the time you do.  For your bookkeeping software, I think switching to a DSL would allow for a more concise and readable way of working. Your command language is a start but is does not express clearly the intentions in the business language...  You should forget about variables the way you use them, something that may resemble the following may be more "natural" : create expense account Bus create asset account Cash create journal QuebecExpenses  register debit_from Cash credit_to Bus of 50$ in QuebecExpense (no dates, so it must be today) register debit_from Cash credit_to Bus of 50$ in QuebecExpense on 01/10/2007 This makes for a much clearer way ding stuff. It feels more natural. Granted, it's harder to code but usability increase a lot.  A good way to create the DSL, is to think how the user would use the software (think about the business domain). Then create a syntax that the user would feel comfortable with. Think also of using sensible defaults when the user does not express everything (why would he care about an id or about taxes when entering stuff... The tax rules should be in the software - maybe associate with the journal). You can customize the DSL to be easier to parse or due to some technical issues, but do it after you decide the general structure.  Remember the rule that most of the time, less is more!   SeeU!

Posted by Timothy Washington on 2007.06.06| Original post

#archive #frye #thebox

A Programmer's Progress: Developing The Bkeeping Language

Read more

Photo Album_Chillin' In Montreal

Photo Album_Chillin' In New York

Photo Album_Coding

Photo Album_Europe 2003