Skip to main content

Write your own language

I held a lunch seminar about writing your own Lisp language, called Write your own language. Here I'd like to summerize it for you.

The basics

I've use fsyacc and fslex to produce this language. It is quite cumbersome to setup, so I recommend that you clone branch "Step0" of my repository on github.

git clone -b Step0 git@github.com:miklund/lisp.dsl.git

There are four important files in this project.

  • lexer.fsl
  • parser.fsy
  • statements.fs (the abstract syntax tree)
  • invoker.fs

These are pretty much self explanatory. The statements.fs contains the abstract syntax tree. The invoker tries to invoke the AST by converting it to F# quotations.

Step 1 - Primitives

First out is specifying the primitives. Here's the lexing.

{
module Lexer
open System
open Parser
open Microsoft.FSharp.Text.Lexing
}

let whitespace = [' ' '\n' '\r'] let digit = ['0'-'9'] let number = '-'?digit+ let boolean = "t" | "nil"

rule tokenize line = parse | whitespace { tokenize line lexbuf } | number { NUMBER (Int32.Parse(LexBuffer<>.LexemeString lexbuf)) } | boolean { BOOLEAN((LexBuffer<>.LexemeString lexbuf) = "t") } | eof { END }

I've marked the important rows.

  1. Define what is a digit
  2. A number consists of 1..n digits and may have dash in front to indicate it is a negative number.
  3. A boolean value is t for true, or nil for false
  4. The token NUMBER holds the value as an int
  5. The token BOOLEAN holds the value true if matching string is "t"

The parsing is pretty straight forward.

%{
open System
open Microsoft.FSharp.Collections
open Lisp.Statements
%}

%token <int> NUMBER %token <bool> BOOLEAN %token END

%start start %type <Lisp.Statements.Ast> start

%%

start: primitive END { $1 }

primitive: | NUMBER { Number($1) } | BOOLEAN { Boolean($1) }

%%

I've marked the important rows.

  1. The token NUMBER holds an int
  2. The token BOOLEAN holds a bool
  3. When parsing a number, create a Number value from the AST
  4. When parsing a boolean, create a Boolean value from the AST

Let's look at the AST

namespace Lisp
module Statements =
type Ast = | Number of int | Boolean of bool

This is pretty obvious by now. A discriminated union that accepts both numbers and booleans.

Step 2 - Function calls

The most basic thing to do in Lisp would be to call a function. There are some basic functions in the lisp framework, and we will look at how you go on and make calls to such functions.

First the lexing. We expect to write function calls as follow.

(add 1 2)

And it would generate tokens like this.

LPAREN IDENTIFIER(add) NUMBER(1) NUMBER (1) RPAREN

We can achieve this with the following lexer.fsl

{
module Lexer
open System
open Parser
open Microsoft.FSharp.Text.Lexing
}

let whitespace = [' ' '\n' '\r'] let digit = ['0'-'9'] let number = '-'?digit+ let boolean = "t" | "nil" let char = ['a'-'z' 'A'-'Z']

rule tokenize line = parse | whitespace { tokenize line lexbuf } | number { NUMBER (Int32.Parse(LexBuffer<>.LexemeString lexbuf)) } | boolean { if (LexBuffer<>.LexemeString lexbuf) = "t" then BOOLEAN(true) else BOOLEAN(false) } | char+ { IDENTIFIER(LexBuffer<_>.LexemeString lexbuf) } | "(" { LPAREN } | ")" { RPAREN } | eof { END }

I've marked the important lines

  1. The name of the function we wan't to call is an identifier, and we need to specify what is legal characters in an identifier.
  2. Several characters in a row makes out an IDENTIFIER.
  3. Left parenthesis marks the start of a function call.
  4. Right parenthesis marks the end of a function call.

If we take a look at the parser, we need to add matching for the function call, but most interesting here is how we parse an arbitrary number of parameters.

%{
open System
open Microsoft.FSharp.Collections
open Lisp.Statements
%}

%token <int> NUMBER %token <bool> BOOLEAN %token <string> IDENTIFIER %token LPAREN RPAREN %token END

%start start %type <Lisp.Statements.Ast> start

%%

start: expression END { $1 }

expression: | NUMBER { Number($1) } | BOOLEAN { Boolean($1) } | LPAREN IDENTIFIER parameters { Call($2, $3) }

parameters: | RPAREN { [] } | expression parameters { $1 :: $2 } %%

I've marked the interesting lines.

  1. The token for IDENTIFIER holds the name of the identifier in question.
  2. The function call is matched by a left parenthesis, identifier and parameters.
  3. The right parenthesis marks the end of the parameter list.
  4. We concatenate the parameter list recursively, looking to match a new parameter as long as we don't match a right parenthesis.

The abstract syntax tree has evolved slightly, making way for the function call.

namespace Lisp
module Statements =
type Ast = | Number of int | Boolean of bool | Call of string * Ast list

We have come to a point where we could invoke an AST. I've written a conversion from our AST to fsharp quotation and then uses Linq.QuotationEvaluation to invoke the quotation. Take a look at the following.

module Invoker

open Lisp.Statements open Microsoft.FSharp.Quotations open Linq.QuotationEvaluation open Microsoft.FSharp.Reflection

// functions that are callable from inside my lisp program let framework = [ "add", typeof<int -> int -> int>, <@@ (fun a b -> a + b) @@>; "sub", typeof<int -> int -> int>, <@@ (fun a b -> a - b) @@>; "eq", typeof<int -> int -> bool>, <@@ (fun (a : int) b -> a = b) @@>; "if", typeof<bool -> int -> int -> int>, <@@ (fun cond (yes : int) no -> if cond then yes else no) @@> ]

// Create an application // Example: Application (Application (add, Value (1)), Value (2)) let application var args = args |> List.fold(fun prev next -> Quotations.Expr.Application(prev, next)) var

// convert ast to Quotations.Expr let rec toExprUntyped vars = function | Number(x) -> Quotations.Expr.Value(x) | Boolean(x) -> Quotations.Expr.Value(x) | Call(name, arguments) -> // resolve arguments let argumentExpressions = arguments |> List.map (fun arg -> (toExprUntyped vars arg)) // create application application (vars |> Map.find(name)) argumentExpressions

// typed version of toExprUntyped let toExpr<'a> (vars : Map<string, Quotations.Expr>) ast : Quotations.Expr<'a> = (toExprUntyped vars ast) |> Quotations.Expr<'a>.Cast

// take name, function signature and body, create a var and let expression in a tuple let create_fn (name, signature, lambda) = let var = Quotations.Var(name, signature) var, (fun (body : Quotations.Expr) -> Quotations.Expr.Let(var, lambda, body))

// make next let expression become body of the previous let rec mergeLetExpressions<'a> (exprs : (Quotations.Expr -> Quotations.Expr) list) (body : Quotations.Expr<'a>) = match exprs with | [] -> body | hd :: tl -> hd(mergeLetExpressions tl body) |> Quotations.Expr<'a>.Cast

// execute ast let invoke<'a> (framework : (string * System.Type * Quotations.Expr) list) ast = let vars, exprs = framework |> List.map create_fn |> List.unzip // create vars map let state = vars |> List.map (fun var -> var.Name, Quotations.Expr.Var(var)) |> Map.ofList // build expression tree from framework let header = (mergeLetExpressions<'a> exprs) // join framework expression tree with intepretated ast (header (toExpr<'a> state ast)).Eval()

There is so much stuff going on here that it is hard to explain it all. In short terms, the invoke method takes a list of predefined quotations, that will be the framework. It builds LET-expressions and links them all into a chain until there is only one expression body left to fill. This is where we convert and insert our AST.

Step 3 - Defining functions

The language comes to life as we are able to define our own functions. If I were to write a complete, production ready intepreter I would not create my own keyword for defun. Instead I would use the structures that is already in place and just make defun a part of the framework. This is however an experimental and made for learning.

(defun myAdd (x y) (add x y))
(myAdd 1 2)

First the lexer.

{
module Lexer
open System
open Parser
open Microsoft.FSharp.Text.Lexing
}

let whitespace = [' ' '\t'] let newline = ('\n' | '\r' '\n') let digit = ['0'-'9'] let number = '-'?digit+ let boolean = "t" | "nil" let char = ['a'-'z' 'A'-'Z']

rule tokenize line = parse | whitespace { tokenize line lexbuf } | newline { lexbuf.EndPos <- lexbuf.EndPos.NextLine; tokenize (line + 1) lexbuf; } | "defun" { DEFUN } | number { NUMBER (Int32.Parse(LexBuffer<>.LexemeString lexbuf)) } | boolean { if (LexBuffer<>.LexemeString lexbuf) = "t" then BOOLEAN(true) else BOOLEAN(false) } | char+ { IDENTIFIER(LexBuffer<_>.LexemeString lexbuf) } | "(" { LPAREN } | ")" { RPAREN } | eof { END }

Not much happened here. We added a new token DEFUN at line 18.

While looking at the parser you'll notice that we're not using the same structure for reading parameters as we where reading arguments. Simply because we don't want the programmer to input primitives as function arguments. That would be crazy.

%{
open System
open Microsoft.FSharp.Collections
open Lisp.Statements

%}

%token <int> NUMBER %token <bool> BOOLEAN %token <string> IDENTIFIER %token DEFUN %token LPAREN RPAREN %token END

%start start %type <Lisp.Statements.Ast> start

%%

start: expression END { $1 }

expression: | NUMBER { Number($1) } | BOOLEAN { Boolean($1) } | IDENTIFIER { Identifier($1) } | LPAREN DEFUN IDENTIFIER LPAREN arguments expression RPAREN expression { Defun($3, $5, $6, $8) } | LPAREN IDENTIFIER parameters { Call($2, $3) }

parameters: | RPAREN { [] } | expression parameters { $1 :: $2 }

arguments: | RPAREN { [] } | IDENTIFIER arguments { $1 :: $2 }

%%

There are some things noteworthy.

  1. After the defun expression there is another expression. That is simply the rest of the program with the defun expression in scope.
  2. The difference of getting arguments from getting parameters, is that we're only after identifiers. If the programmer has entered a primitive as argument, he should get a parse error.

There has been some additions to the AST. The variable type. Here it is used for defining parameters to a function, but it could also be used for storing values into variables in a future version.

namespace Lisp
module Statements =    
    type Ast = 
    | Unparsed
    | Number of int
    | Boolean of bool
    | Call of string * Ast list
    | Identifier of string
    | Defun of string * Variable list * Ast * Ast

    and Variable = string

Back to the invoker. There big changes here is where we handle the defun AST-part. A weakness is that we lock in to only using int as parameter. This could be mended by reverse lookup the type, from in what context it is being used in the body.

Another weakness is that we don't support direct recursion. You can't call a function from within the function itself. It will be another project to fix those issues.

module Invoker

open Lisp.Statements open Microsoft.FSharp.Quotations open Linq.QuotationEvaluation open Microsoft.FSharp.Reflection

// functions that are callable from inside my lisp program let framework = [ "add", typeof<int -> int -> int>, <@@ (fun a b -> a + b) @@>; "sub", typeof<int -> int -> int>, <@@ (fun a b -> a - b) @@>; "eq", typeof<int -> int -> bool>, <@@ (fun (a : int) b -> a = b) @@>; "if", typeof<bool -> int -> int -> int>, <@@ (fun cond (yes : int) no -> if cond then yes else no) @@>; "lt", typeof<int -> int -> bool>, <@@ (fun (a : int) b -> a < b ) @@> ]

// convert ast to Quotations.Expr let rec toExprUntyped vars = function | Number(x) -> Quotations.Expr.Value(x) | Boolean(x) -> Quotations.Expr.Value(x) | Identifier(x) -> vars |> Map.find(x) | Call(name, arguments) -> // resolve arguments let argumentExpressions = arguments |> List.map (fun arg -> (toExprUntyped vars arg)) // create application argumentExpressions |> List.fold(fun prev next -> Quotations.Expr.Application(prev, next)) (vars |> Map.find(name))

// debug values // let name = "myAdd" // let parameters = [("x", typeof<int>); ("y", typeof<int>)] // let bodyAst = Call ("add", [(Identifier "x"); (Identifier "y")]) | Defun(name, parameters, bodyAst, inscopeAst) -> // create function local variables (NOTE: locked into parameters as ints) let localVars = parameters |> List.map (fun param -> Quotations.Var(param, typeof<int>)) // create function local variables expressions let localVarsExpr = localVars |> List.map (Quotations.Expr.Var) // create local scope let localScope = List.zip parameters localVarsExpr |> List.fold (fun scope (paramName, varExpr) -> scope |> Map.add paramName varExpr) vars // evaluate body let bodyExpr = toExprUntyped localScope bodyAst // create body lambda let lambdaExpr = localVars |> List.rev |> List.fold (fun expr var -> Quotations.Expr.Lambda(var, expr)) bodyExpr // create function handle let funcVar = Quotations.Var(name, lambdaExpr.Type) // create let expression let letExpr next = Quotations.Expr.Let(funcVar, lambdaExpr, next) // return evaluation of next, with this function in scope letExpr (toExprUntyped (vars.Add(name, Quotations.Expr.Var(funcVar))) inscopeAst)

| x -> failwith (sprintf "Unknown program construct %A" x)

// typed version of toExprUntyped let toExpr<'a> (vars : Map<string, Quotations.Expr>) ast : Quotations.Expr<'a> = (toExprUntyped vars ast) |> Quotations.Expr<'a>.Cast

// take name, function signature and body, create a var and let expression in a tuple let create_fn (name, signature, lambda) = let var = Quotations.Var(name, signature) var, (fun (body : Quotations.Expr) -> Quotations.Expr.Let(var, lambda, body))

// make next let expression become body of the previous let rec mergeLetExpressions<'a> (exprs : (Quotations.Expr -> Quotations.Expr) list) (body : Quotations.Expr<'a>) = match exprs with | [] -> body | hd :: tl -> hd(mergeLetExpressions tl body) |> Quotations.Expr<'a>.Cast

// execute ast let invoke<'a> (framework : (string * System.Type * Quotations.Expr) list) (ast : Ast) = let state, exprs = framework |> List.map create_fn |> List.unzip // create vars map let vars = state |> List.map (fun var -> var.Name, Quotations.Expr.Var(var)) |> Map.ofList // build expression tree from framework let header = (mergeLetExpressions<'a> exprs) // join framework expression tree with intepretated ast (header (toExpr<'a> vars ast)).Eval()

Yes, it is hard to read, but at least well commented. :)

Summary

While figuring out how lexing and parsing works, I spent a lot of time surfing the web for others with the same affliction. It was hard. There was almost no information around. I hope this will help someone that is looking for more information and ideas about lexing and parsing with the F# toolset.

comments powered by Disqus