Sic: Yet Another Mediocre Small Lisp Dialect
Like many bad ideas, this one came from seeking attention on social media. I had envisioned tooting something like this on Mastodon:
(dontforgettolikeandsubscribe)
This was the result of a series of realizations I had while learning more about modern C++. Specifically:
1. $
is a valid 'word' character in C++ these days.
That means you can use it in names. So this is valid C++:
const int $20_same_as_in_town = 20;
But since this not widely known, I can use it for all kinds of shenanigans.
2. You can use variadic templates to fake Lisp-style expressions.
As you know Bob, variadic templates are function templates that take arbitrarily many arguments. And since they're templates, their type is also up for grabs.
And as you also know Bob, Lisp-style lists are just linked lists of pairs of pointers where the first pointer holds the value and the second the next pair in the sequence. And Lisp expressions are just lists of expressions where the first value is the function to call while the rest are its arguments.
So if we create a bunch of C++ classes to represent basic Lispish types:
class string : public obj { ... };
class symbol : public obj { ... };
class number : public obj { ... };
plus a common base class:
class obj { ... };
plus a simple C++ class to hold a pair:
class pair : public obj {
public:
obj * const first; // Not named 'car'; cope.
obj * const rest; // Ditto for 'cdr'.
pair(obj *a, obj *d) : car(a), cdr(d) {}
}
and a set of overloaded helper functions to convert basic C++ types to Lispish types:
static inline obj* _w(std::string s) { return new string(s); }
static inline obj* _w(int i) { return new number((long)i); }
static inline obj* _w(long l) { return new number(l); }
static inline obj* _w(double d) { return new number(d); }
static inline obj* _w(obj *o) { return o; }
we can create a variadic function template that constructs a Lispish list:
template<typename T> obj* $(T o) { return new pair(_w(o), nil); }
template<typename T, typename... Objects>
obj* $(T first, Objects... rest) {
return new pair(_w(first), $(rest...));
}
(The first definition of $
works on any call with one argument; the
second expands to a function that takes the first argument and
recurses on the rest.)
Calling it looks like this:
$("foo", 42, $("add", 2, 2))
which looks Lispish if you squint hard enough. But the resulting list is an actual Lisp-style list.
3. Evaluating functions is pretty straightforward.
A function is just one of the Lispish C++ types. It implements a
method named call
, which takes an argument list and returns a
result:
class callable : public obj {
public:
virtual obj* call(obj* actualArgs) const = 0;
};
Yeah, yeah, this is an abstract base class. We actually need two types of callables:
class function : public callable { ... }
class builtin : public callable { ... }
function
is a function written in Sic. It holds a Lispish list of
expressions (also Lispish lists) and evaluates them by calling eval
on each of them. builtin
holds a built-in function--a C++ lambda
(or other function pointer, in theory)--that it calls instead.
eval
is the function that evaluates a Sic expression. You give it
the expression and if it's a list, it recursively calls itself on each
item and collects the results. Then, it calls the first item's call
method with the rest of the list as an argument and returns the
result. If the argument isn't a list, it just returns it.
So like any good Lisp function, it either does nothing significant or recurses.
4. Also, I can do macros because I hate myself.
As you know Bob--hey, why are you walking away? I NEED YOU FOR THIS RHETORICAL DEVICE, BOB!
Anyway, as Bob over there already knows, a macro is a powerful and elegent way to let programmers mutate their Lisp into a completely different language while introducing subtle and impossible-to-find bugs.
More precisely, It's a function that gets called on the raw, unevaluated argument list, does something with that and returns something else that does get evaluated as a normal Lispish expression. On compiled Lisps (i.e. not this one), it gets called by the compiler.
The thing is, we need macros for system-ish and control-flow-ish
stuff. You can't do an if
statement if eval
is always going to
evaluate the THEN and ELSE expressions regardless.
So we do macros by adding the isMacro
flag to callable:
class callable : public obj {
public:
const bool isMacro;
virtual obj* call(obj* actualArgs) const = 0;
};
(isMacro
gets set by the constructor.)
Then we make eval
check if it's true. If it is, it calls the
function on the arguments first, before they're evaluated, captures
the result and recursively evaluates that.
And there you go. Self-modifying code made easy.
(The Scheme community has new and interesting ways to make macros safer and easier to use. I strongly disagree with this. If macros are easy to use, people might start using them, and that's only going to lead to trouble.)
5. Errors are just C++ exceptions
The easy way to handle errors here is to just throw a C++ exception where necessary. We give them a common base class to distinguish them from other types of exceptions, but that's pretty much it.
Need a stack trace? Put a std::vector
in the base class and wrap
eval
or call
in a try block that adds the details to it. Easy!
6. Local namespaces are a linked-list of std::map
-holding objects
Now that we're actually interpreting code, we need variables. This is
pretty simple, right? C++ has std::map
which does pretty much
everything we need. Just wrap it with a class:
class context {
private:
std::map<std::string, obj*> items;
public:
void set(const std::string& name, obj* value) { ... }
obj* get(const std::string& name) const { ... }
}
And that's all we need--no, wait, there's also a global scope so it needs to fall through to that. So we add a pointer to an outer scope:
context * const parent;
and make set
and get
fall through to the parent if it's not
local. Easy!
No, wait. set
falling through means I have to make defining a
variable in a local scope so stuff like this will work:
(let ( (outer nil) )
(let ( (x 1) )
(setq outer 42)))
We don't want setq
to define a new variable outer
in its
scope; it should be writing to the existing outer
. So we need to
make defining variables and assigning to them separate things. We do
this by making set
throw an exception if it can't find the variable,
then adding a define
method:
void define(const std::string& name, obj* value) { ... }
And that... works?
Looks around nervously.
Next, we need to add the context as an argument to Callable::call
:
virtual obj* call(obj* actualArgs, context* outer) const = 0;
and propagate it to eval
and whatever else needs it.
Because built-in functions now get a pointer to the caller's context,
they can modify the caller's variables. Which is generally a Bad
Thing unless we need to write set
, which we do. So it's actually a
good thing, I guess.
(We also have to write the macro setq
--which expands to
set
--because typing that extra quote is so burdensome. No,
seriously, it's a huge pain--the number of times I forgot it is
basically the number of times I wrote buggy Sic code.)
We also need this when defining lambdas because (as Bob over there knows), they can access the scope in which they are defined. That is, this:
(defun return-x-f (x) (lambda () x))
(setq fn (return-x-f 42))
(print (fn))
will print 42, because the lambda holds on to the outer function's context.
So lambda
(well, its back-end--it's a macro, after all) gets a
pointer to the caller's context and stashes it in the function
object:
context *outer;
When the lambda gets called, it creates its context (the equivalent of
a stack frame in C++) and sets outer
as the parent.
Conveniently, non-lambda functions (ironically-named fun
) are just
like lambdas except that their outer
pointer just points to the
global context (i.e. the outermost parent).
Finally, we need to actually look up variables. That ends up
being a one-liner in eval
:
if (expr->isSymbol()) { return context->get(expr->text); }
And that's all.
7. Oh yeah, I forgot to talk about Symbols
As Bob--hey, where'd he go? Anyway, as Bob knows, Lispish languages have this concept of a symbol, which is different from a string. A symbol is a chunk of text but it represents an internal variable name.
If you hand eval
a string, it just gives you the string back but if
you hand it a symbol, it'll look it up in the current context and give
you the value back instead. Which you already know, because you read
the last section. Right?
So in Sic, a symbol
class is just an obj
subclass that holds a
std::string
. Except the field has a different name from the Sic
string
(text
vs contents
) so that I can't accidentally use one
instead of the other.
class symbol : public obj {
public:
const std::string text;
explicit symbol(std::string& v) : text(v) {}
};
There's nothing really magical about it except that eval
can tell
the difference between the two and handles them differently.
Oh, and also, I did a clever thing in the symbol
class where I
guarantee that there's only ever one symbol
for a particular series
of characters. This ensures the Lispish requirement that symbols be
unique and also lets you test equality in C++ by comparing the
pointers with ==
.
So it really looks (more) like this:
class symbol : public obj {
private:
inline static std::map<std::string, symbol*> symbols;
explicit symbol(std::string& v) : text(v) {}
public:
static symbol* intern(std::string s) {
if (symbols.count(s) == 0) { symbols[s] = new symbol(s); }
return symbols[s];
}
};
(Basically, I make the constructor private and provide a public static
method called intern
that calls it, but only if there's not already
an instance in symbols
. In that case, it first stashes the symbol
there before returning it. Otherwise, it returns a pointer to the
stashed symbol already.
This all works because Sic types are immutable (barring C++ type abuse, that is).)
The other thing I need to mention is that code like
$("setq", "foo", 42)
that I used above doesn't actually work the way you'd naively
expect. The arguments are strings which eval
won't look up. We
need to make them into symbols.
Unfortunately, C++ doesn't have a symbol type and we already transparently turn C++ strings into Sic strings, so there's not an obvious conversion.
So we make it explicit with a helper function. Which I name $$
,
'cuz why not:
static inline obj* $$(const std::string& s) { return symbol::intern(s); }
So now, we can do explicit symbols like this:
$( $$("setq"), $$("foo"), 42)
Which is only slightly uglier.
(To make this slightly easier, sic.hpp
also defines a bunch of
global consts that hold pointers to the corresponding functions. This
lets you replace the above with:
$(set, $$("foo"), 42)
which is a bit nicer.)
8. read
is complex and ugly but uninteresting
So you'll note that at this point (assuming I've also written a bunch of useful built-in functions), we pretty much have a working(ish) programming language. I can do stuff like this:
$(progn,
$(print, "starting!\n"),
$(defun, $$("fib"), $( $$("n") ),
$(if_op, $(le, $$("n"), 1),
$(list, 1),
$(if_op, $(eq_p, $$("n"), 2),
$(list, 1, 1),
$(let, $( $( $$("prev"), $( $$("fib"), $(sub, $$("n"), 1) ) ) ),
$(pair_op,
$(add, $(first, $$("prev")),
$(second, $$("prev"))), $$("prev") )
)
)
)
),
$(print,
$( $$("fib"), $(str_to_num, $(third, $$("argv")) ) ),
"\n")
)
;
and it works.
Lispish source code is basically just text serialization of its data types--primordial JSON, as it were, so all I really need to do to write and evaluate scripts is a function to parse lists, names and literal types.
In most Lisps, this is called read
so I called it that too.
read
is written in pure C++ and does pretty much what you'd expect
with std::stream
and std::string
. Boring, in other words. But it
works.
I did make one attempt to be innovative and modern and made the
comment character #
instead of ;
because that's more Unixy and you
can do the #!
thing to launch scripts. Of course, editors still
expect ;
so I ended up putting back ;
as an alternate comment
character.
So now you have two comment characters for the price of one. Lucky you.
8. Garbage Collection would be nice but it's too much work.
Unlike most Lispish languages, Sic implements garbage collection by having me suggest that you exit your program sometime before your computer runs out of RAM. After that, object memory is reclaimed very efficiently.
(It would be (sigh) relatively straightforward to create an abstract
base class for obj
and context
that stores all of the instances in
a global registry and provides marking for mark-and-sweep, but that's
a lot more work than I want to do right now. It's probably easier to
just grab the Boehm GC and use
that.)
9. It's no longer fun but I can't stop. Help!
So now that I have read
, I can split Sic up into a library and a
script runner.
The library gets a function called root_context()
that creates a
global context
(i.e. one with no parent) and loads it up with all of
the built-in functions. A neat side effect of this is that you can
now have multiple Sic instances in your program; just create a new
root context for each.
The runner links to the library, calls root_context()
and either
loads in the script you give it or lets you type in commands. (If you
want readline support, rlwrap
is available.)
Oh, but it'd be nice to have a unit test framework. So I'll add extra
testing builtins but only if the script name ends with .sictest
.
Which turns out to be much trickier than it looks.
So once that's working, I should probably test most of these
functions. But I don't really have proper equality testing so I
should add that. And tests. Also, I should write examples. But
this example would be much easier if I had cond
(plus tests).
And cond
makes or
really easy, so I should add that (plus tests).
But or
implies and
, so I should add that as well (plus tests).
And I really should--
On second thought, it's done now.
10. Screw it. It's on Github now.
If you want to play with the code, it's here. I'm releasing it under the terms of the wxWidgets license, which is basically the GNU LGPL but with less restrictions on using it in your own programs.
Have fun. Or not.
# Posted 2019-06-22 23:49:10 UTC; last changed 2021-05-07 02:00:43 UTC