Sic: Yet Another Mediocre Small Lisp Dialect

Like many bad ideas, this one came from seeking attention on social media. I had envisioned tooting something like this on Mastodon:

Me? Oh, I was just writing Lisp code in C++. As one does.

(dontforgettolikeandsubscribe)

This was the result of a series of realizations I had while learning more about modern C++. Specifically:

1. $ is a valid 'word' character in C++ these days.

That means you can use it in names. So this is valid C++:

const int $20_same_as_in_town = 20;

But since this not widely known, I can use it for all kinds of shenanigans.

2. You can use variadic templates to fake Lisp-style expressions.

As you know Bob, variadic templates are function templates that take arbitrarily many arguments. And since they're templates, their type is also up for grabs.

And as you also know Bob, Lisp-style lists are just linked lists of pairs of pointers where the first pointer holds the value and the second the next pair in the sequence. And Lisp expressions are just lists of expressions where the first value is the function to call while the rest are its arguments.

So if we create a bunch of C++ classes to represent basic Lispish types:

class string : public obj { ... };
class symbol : public obj { ... };
class number : public obj { ... };

plus a common base class:

class obj { ... };

plus a simple C++ class to hold a pair:

class pair : public obj {
public:
    obj * const first;      // Not named 'car'; cope.
    obj * const rest;       // Ditto for 'cdr'.
    pair(obj *a, obj *d) : car(a), cdr(d) {}
}

and a set of overloaded helper functions to convert basic C++ types to Lispish types:

static inline obj* _w(std::string s)  { return new string(s); }
static inline obj* _w(int i)          { return new number((long)i); }
static inline obj* _w(long l)         { return new number(l); }
static inline obj* _w(double d)       { return new number(d); }
static inline obj* _w(obj *o)         { return o; }

we can create a variadic function template that constructs a Lispish list:

template<typename T> obj* $(T o) { return new pair(_w(o), nil); }

template<typename T, typename... Objects>
obj* $(T first, Objects... rest) { 
    return new pair(_w(first), $(rest...)); 
}

(The first definition of $ works on any call with one argument; the second expands to a function that takes the first argument and recurses on the rest.)

Calling it looks like this:

$("foo", 42, $("add", 2, 2))

which looks Lispish if you squint hard enough. But the resulting list is an actual Lisp-style list.

3. Evaluating functions is pretty straightforward.

A function is just one of the Lispish C++ types. It implements a method named call, which takes an argument list and returns a result:

class callable : public obj {
public:
    virtual obj* call(obj* actualArgs) const = 0;
};

Yeah, yeah, this is an abstract base class. We actually need two types of callables:

class function : public callable { ... }
class builtin : public callable { ... }

function is a function written in Sic. It holds a Lispish list of expressions (also Lispish lists) and evaluates them by calling eval on each of them. builtin holds a built-in function--a C++ lambda (or other function pointer, in theory)--that it calls instead.

eval is the function that evaluates a Sic expression. You give it the expression and if it's a list, it recursively calls itself on each item and collects the results. Then, it calls the first item's call method with the rest of the list as an argument and returns the result. If the argument isn't a list, it just returns it.

So like any good Lisp function, it either does nothing significant or recurses.

4. Also, I can do macros because I hate myself.

As you know Bob--hey, why are you walking away? I NEED YOU FOR THIS RHETORICAL DEVICE, BOB!

Anyway, as Bob over there already knows, a macro is a powerful and elegent way to let programmers mutate their Lisp into a completely different language while introducing subtle and impossible-to-find bugs.

More precisely, It's a function that gets called on the raw, unevaluated argument list, does something with that and returns something else that does get evaluated as a normal Lispish expression. On compiled Lisps (i.e. not this one), it gets called by the compiler.

The thing is, we need macros for system-ish and control-flow-ish stuff. You can't do an if statement if eval is always going to evaluate the THEN and ELSE expressions regardless.

So we do macros by adding the isMacro flag to callable:

class callable : public obj {
public:
    const bool isMacro;
    virtual obj* call(obj* actualArgs) const = 0;
};

(isMacro gets set by the constructor.)

Then we make eval check if it's true. If it is, it calls the function on the arguments first, before they're evaluated, captures the result and recursively evaluates that.

And there you go. Self-modifying code made easy.

(The Scheme community has new and interesting ways to make macros safer and easier to use. I strongly disagree with this. If macros are easy to use, people might start using them, and that's only going to lead to trouble.)

5. Errors are just C++ exceptions

The easy way to handle errors here is to just throw a C++ exception where necessary. We give them a common base class to distinguish them from other types of exceptions, but that's pretty much it.

Need a stack trace? Put a std::vector in the base class and wrap eval or call in a try block that adds the details to it. Easy!

6. Local namespaces are a linked-list of std::map-holding objects

Now that we're actually interpreting code, we need variables. This is pretty simple, right? C++ has std::map which does pretty much everything we need. Just wrap it with a class:

class context {
private:
    std::map<std::string, obj*> items;
public:
    void set(const std::string& name, obj* value) { ... }
    obj* get(const std::string& name) const { ... }
}

And that's all we need--no, wait, there's also a global scope so it needs to fall through to that. So we add a pointer to an outer scope:

    context * const parent;

and make set and get fall through to the parent if it's not local. Easy!

No, wait. set falling through means I have to make defining a variable in a local scope so stuff like this will work:

(let ( (outer nil) )
    (let ( (x 1) )
        (setq outer 42)))

We don't want setq to define a new variable outer in its scope; it should be writing to the existing outer. So we need to make defining variables and assigning to them separate things. We do this by making set throw an exception if it can't find the variable, then adding a define method:

    void define(const std::string& name, obj* value) { ... }

And that... works?

Looks around nervously.

Next, we need to add the context as an argument to Callable::call:

    virtual obj* call(obj* actualArgs, context* outer) const = 0;

and propagate it to eval and whatever else needs it.

Because built-in functions now get a pointer to the caller's context, they can modify the caller's variables. Which is generally a Bad Thing unless we need to write set, which we do. So it's actually a good thing, I guess.

(We also have to write the macro setq--which expands to set--because typing that extra quote is so burdensome. No, seriously, it's a huge pain--the number of times I forgot it is basically the number of times I wrote buggy Sic code.)

We also need this when defining lambdas because (as Bob over there knows), they can access the scope in which they are defined. That is, this:

(defun return-x-f (x) (lambda () x))
(setq fn (return-x-f 42))
(print (fn))

will print 42, because the lambda holds on to the outer function's context.

So lambda (well, its back-end--it's a macro, after all) gets a pointer to the caller's context and stashes it in the function object:

    context *outer;

When the lambda gets called, it creates its context (the equivalent of a stack frame in C++) and sets outer as the parent.

Conveniently, non-lambda functions (ironically-named fun) are just like lambdas except that their outer pointer just points to the global context (i.e. the outermost parent).

Finally, we need to actually look up variables. That ends up being a one-liner in eval:

if (expr->isSymbol()) { return context->get(expr->text); }

And that's all.

7. Oh yeah, I forgot to talk about Symbols

As Bob--hey, where'd he go? Anyway, as Bob knows, Lispish languages have this concept of a symbol, which is different from a string. A symbol is a chunk of text but it represents an internal variable name.

If you hand eval a string, it just gives you the string back but if you hand it a symbol, it'll look it up in the current context and give you the value back instead. Which you already know, because you read the last section. Right?

So in Sic, a symbol class is just an obj subclass that holds a std::string. Except the field has a different name from the Sic string (text vs contents) so that I can't accidentally use one instead of the other.

class symbol : public obj {
public:
    const std::string text;
    explicit symbol(std::string& v) : text(v) {}
};

There's nothing really magical about it except that eval can tell the difference between the two and handles them differently.

Oh, and also, I did a clever thing in the symbol class where I guarantee that there's only ever one symbol for a particular series of characters. This ensures the Lispish requirement that symbols be unique and also lets you test equality in C++ by comparing the pointers with ==.

So it really looks (more) like this:

class symbol : public obj {
private:
    inline static std::map<std::string, symbol*> symbols;
    explicit symbol(std::string& v) : text(v) {}

public:
    static symbol* intern(std::string s) {
        if (symbols.count(s) == 0) { symbols[s] = new symbol(s); }
        return symbols[s];
    }
};

(Basically, I make the constructor private and provide a public static method called intern that calls it, but only if there's not already an instance in symbols. In that case, it first stashes the symbol there before returning it. Otherwise, it returns a pointer to the stashed symbol already.

This all works because Sic types are immutable (barring C++ type abuse, that is).)

The other thing I need to mention is that code like

$("setq", "foo", 42)

that I used above doesn't actually work the way you'd naively expect. The arguments are strings which eval won't look up. We need to make them into symbols.

Unfortunately, C++ doesn't have a symbol type and we already transparently turn C++ strings into Sic strings, so there's not an obvious conversion.

So we make it explicit with a helper function. Which I name $$, 'cuz why not:

static inline obj* $$(const std::string& s)  { return symbol::intern(s); }

So now, we can do explicit symbols like this:

$( $$("setq"), $$("foo"), 42)

Which is only slightly uglier.

(To make this slightly easier, sic.hpp also defines a bunch of global consts that hold pointers to the corresponding functions. This lets you replace the above with:

$(set, $$("foo"), 42)

which is a bit nicer.)

8. read is complex and ugly but uninteresting

So you'll note that at this point (assuming I've also written a bunch of useful built-in functions), we pretty much have a working(ish) programming language. I can do stuff like this:

$(progn,

  $(print, "starting!\n"),

  $(defun, $$("fib"), $( $$("n") ),
    $(if_op, $(le, $$("n"), 1),
      $(list, 1),
      $(if_op, $(eq_p, $$("n"), 2),
        $(list, 1, 1),
        $(let, $( $( $$("prev"), $( $$("fib"), $(sub, $$("n"), 1) ) ) ),
          $(pair_op, 
            $(add, $(first, $$("prev")), 
                   $(second, $$("prev"))), $$("prev") )
            )
          )
        )
      ),

  $(print,
    $( $$("fib"), $(str_to_num, $(third, $$("argv")) ) ),
    "\n")
    )
;

and it works.

Lispish source code is basically just text serialization of its data types--primordial JSON, as it were, so all I really need to do to write and evaluate scripts is a function to parse lists, names and literal types.

In most Lisps, this is called read so I called it that too.

read is written in pure C++ and does pretty much what you'd expect with std::stream and std::string. Boring, in other words. But it works.

I did make one attempt to be innovative and modern and made the comment character # instead of ; because that's more Unixy and you can do the #! thing to launch scripts. Of course, editors still expect ; so I ended up putting back ; as an alternate comment character.

So now you have two comment characters for the price of one. Lucky you.

8. Garbage Collection would be nice but it's too much work.

Unlike most Lispish languages, Sic implements garbage collection by having me suggest that you exit your program sometime before your computer runs out of RAM. After that, object memory is reclaimed very efficiently.

(It would be (sigh) relatively straightforward to create an abstract base class for obj and context that stores all of the instances in a global registry and provides marking for mark-and-sweep, but that's a lot more work than I want to do right now. It's probably easier to just grab the Boehm GC and use that.)

9. It's no longer fun but I can't stop. Help!

So now that I have read, I can split Sic up into a library and a script runner.

The library gets a function called root_context() that creates a global context (i.e. one with no parent) and loads it up with all of the built-in functions. A neat side effect of this is that you can now have multiple Sic instances in your program; just create a new root context for each.

The runner links to the library, calls root_context() and either loads in the script you give it or lets you type in commands. (If you want readline support, rlwrap is available.)

Oh, but it'd be nice to have a unit test framework. So I'll add extra testing builtins but only if the script name ends with .sictest. Which turns out to be much trickier than it looks.

So once that's working, I should probably test most of these functions. But I don't really have proper equality testing so I should add that. And tests. Also, I should write examples. But this example would be much easier if I had cond (plus tests). And cond makes or really easy, so I should add that (plus tests). But or implies and, so I should add that as well (plus tests).

And I really should--

On second thought, it's done now.

10. Screw it. It's on Github now.

If you want to play with the code, it's here. I'm releasing it under the terms of the wxWidgets license, which is basically the GNU LGPL but with less restrictions on using it in your own programs.

Have fun. Or not.


#   Posted 2019-06-22 23:49:10 UTC; last changed 2021-05-07 02:00:43 UTC