Programming languages question

Warning: Long and nerdy.

My current hobby project is a programming language named Deck. The goal is for Deck to be basically Lispish but to have a friendlier syntax.

For those not familiar with Lisp, a brief overview. The fundamental Lisp data type is a list, like so:

(1 2 3 4 5)

Lists can nest:

(1 2 (3 4 5) 4 5)

Lisp expressions are also lists. The first item in an expression is the function to call and the remaining items are the arguments. Nested lists are sub-expressions:

(add 2 (multiply 3 4))

calls function multiply on 3 and 4, then calls plus on 2 and the result of the multiply call.

But remember, lists are a fundamental data type. Lisp can do all kinds of manipulation of lists. This means that Lisp programs are really good at creating, examining and manipulating their own code. This makes the language really powerful (and remarkably non-horrible to debug, believe it or not.)

The downside of this is that Lisp code is cluttered, full of parentheses and hard to read. The goal of Deck is to have a cleaner, more readable syntax without losing the power of Lisp expressions. My solution is to give Deck a syntax that is gets compiled into the more common Lisp-style prefix expression but in a way that is pretty transparent to the user. I do this by adding two new syntax features: infix expressions and line-oriented lists of lists (LoLs).

Infix expressions are pretty simple. We take an expression like this:

(a + b * c)

and transform it to:

[+ a [* b c]]

To keep the parser from getting confused, I switch to using square brackets for prefix (i.e. Lisp-style) expressions and surround infix expressions with round parentheses:

[pivot list ((end - start) / 2 + start)]

or

(a := b + [sin c])

The second feature is the List of Lists (LoL). A LoL is delimited by braces ({ and }) and represents a quoted list containing multiple sublists, each of which starts at the beginning of the line and ends at the first newline that isn't inside some other unclosed bracketting thing:

{
    z_real = 0
    z_imm = 0

    while {[magnitude z_real z_imm] < 2.0} {
        z_real = z_real ** 2 + c_real
        z_imm  = z_imm  ** 2 + c_imm
    }
}

(And there's some extra syntactic sugar that I'm glossing over.)

This turns into a list of 3 lists:

`[  
    [= z_real 0]
    [= z_imm 0]
    [while `[< [magnitude z_real z_imm] 2.0] `[ ... ]]
 ]

(At this point, I realized that the syntax was pretty much the same as Tcl but with most of the ugly parts gone so I just went with that.)

(Also, = is a macro, which is how it can do assignment. If you don't know what this means, it's not important.)

The parser has a simple rule for identifying infix lines: if the even-numbered elements are operators (e.g. '=', '+', etc) and there is an odd number of elements, it's infix. Otherwise, it's prefix.

This is nice and flexible but it has one downside. Consider:

a = b + c

This is infix and turns into:

[= a [+ b c]]

But suppose I typoed:

a = b + + c

Now, it's prefix and turns into

[a = b + + c]

So I extended the rules to be:

  1. If the even-numbered elements are operators and there is an odd number of elements, it's infix.

  2. If it contains no operators, it's prefix.

  3. Otherwise, it's an error.

Unfortunately, this breaks a number of useful expressions:

for n = 1 to 10 { ... }
var n = 42

I can think of three possible solutions:

  1. Discard rule #2 and let the user catch bad behaviour.

  2. Add special cases to rule #2 for for, var and a couple of other things.

  3. Change rule #2 to: It's prefix if there are no operators in the first, second or last position and there are no sequences of two or more adjacent operators.

Of the three, I like #3 the best. #1 adds serious gotcha to the language and #2 leaves the language a lot less flexible than I'd like it to be. #3 seems like a reasonable compromise.

So, can anyone think of a better solution to my problem, or a potential gotcha with #3?


#   Posted 2009-02-17 15:06:00 UTC; last changed 2009-03-24 22:02:00 UTC