Encoding of Semantics

Many of Shen’s semantics are translated directly to matching concepts in JavaScript. Where there are incongruities, they are described here.

Data Types

numbers, strings, exceptions, absvectors

Shen numbers, strings, exceptions (Error) and absvectors (array) are just their related JavaScript types with no special handling.

symbols

Shen symbols are interned JavaScript symbols fetched with Symbol.for so two Shen symbols with the same name will be equivalent.

empty list

JavaScript null is used for the empty list.

conses

ShenScript declares a Cons class specifically to represent conses and cons lists.

functions, lambdas, continuations

All function-like types are JavaScript functions that have been wrapped in logic to automatically perform partial and curried application.

streams

Streams are implemented in a custom way by the host. Typically, they are objects with a close function and a readByte or writeByte function.

Special Forms

if, and, or, cond

Conditional code translates directly into matching JavaScript conditionals. cond expressions are turned into if - else chains.

simple-error, trap-error

Error handling works just like JavaScript, using the throw and try-catch constructs.

Both throw and try are statement syntax in JavaScript, so to make them expressions, there is the raise function for throwing errors and try-catch is wrapped in an iife so it can be embedded in expressions. That iife is async and its immediate invocation is awaited.

defun, lambda, freeze

All function forms build and return JavaScript functions.

let

Local variable bindings are translated into immediately-invoked lambda expressions. This is both for brevity and it because it’s the most natural way to translate Lisp-style let-as-an-expression since variable declarations are statements in JavaScript.

Consider that the form (let X Y Z) can be transformed into the behaviorally equivalent ((lambda X Z) Y). ShenScript does not do it precisely this way because of involved optimizations, but that’s the idea.

do

The do form isn’t officially a form in Shen, just a function that returns it’s second argument, but it is handled as a form in ShenScript in order to take advantage of an additional opportunity for tail-call optimisation.

Shen Booleans vs JavaScript Booleans

Shen uses the symbols true and false for booleans and does not have truthy or falsy semantics like JavaScript or other Lisps. This can make things tricky since Shen’s true and false will always be considered true by JavaScript, and in JavaScript, anything not falsy will count as true and false.

The KLambda-to-JavaScript transpiler does not actually consider booleans to be their own datatype, it treats them as any other symbol.

When doing interop between Shen and JavaScript, it will necessary to carefully convert between the two boolean representations as with so much JavaScript code’s dependence on truthy/falsy semantics, there is no general way of doing so.

Equality

Equality sematics are implemented by the equate function in the backend module.

Values are considered equivalent in ShenScript if they are equal according to the JavaScript === operator or in the following cases:

  • Both values are Cons and both their head and tail are equal according to equate.

  • Both values are JavaScript arrays of the same length and all of their values are equal according to equate.

  • Both values are JavaScript objects with the same constructor, same set of keys and the values for each key are equal according to equate.

Partial Function Application

In Shen, functions have precise arities, and when a function is applied to fewer arguments than it takes, it returns a function that takes the remaining arguments. So since + takes two arguments, if it is applied to a single one, as in (+ 1), the result is a function that takes one number and returns 1 plus that number.

ShenScript also supports curried application, where there are more arguments than the function actually takes. The function is applied to the first N arguments equal to the function’s arity, the result is asserted to be a function, and then the resulting function is applied to the remaining arguments. Repeat this process until there are no remaining arguments or until a non-function is returned and an error is raised.

Warning

Curried application is not a part of the Shen standard, is not supported by shen-cl, and might be removed from ShenScript.

This is implemented in ShenScript for primitives, kernel functions and user-defined functions by wrapping them in a function that takes a variable number of arguments, checks the number passed in, and then returns another function to take the remaining arguments, performs curried application, or simply returns the result.

Tail-Call Optimization

Tail-call optimization is required by the Shen standard and Shen code make prolific use of recursion making TCO a necessity.

In ShenScript, tail calls are handled dynamically using trampolines. When a function is built by the transpiler, the lexical position of expressions are tracked as being in head or tail position. Function calls in head position are a simple invocation and the result is settled; calls in tail position are bounced.

settle

Settling is the process of taking a value that might be a Trampoline, checking if it’s a tramoline, and if it is, running it. The result of running the trampoline is checked if it’s a trampoline, and if so, that is run and this process is repeated until the final result is a non-trampoline value, which is returned.

bounce

Bouncing a function call means making a trampoline from a reference to the function and the list of arguments and returning. The function will actually be invoked when the trampoline is settled at some point later.

Generation of Syntax

Generating JavaScript

ShenScript code generation is built on rendering objects rendering a JavaScript abstract syntax tree conforming to the informal ESTree standard. These ASTs are then rendered to strings using the astring library and then evaluated with an immediately-applied Function constructor.

The Function constructor acts as a kind of isolated-scope version of eval. When a Function is created this way, it does not capture local variables like eval does. This prevents odd behavior from cropping up where Shen code uses a variable or function name that matches some local variable.

Hoisting Globals and Idle Symbols

Lookups of global functions, global symbol values (set/value) and idle symbols (Symbol.for) don’t take up much time, but when done repeatedly are wasteful. To prevent repeated lookups, references to the aforementioned are hoisted to the top of each expression tree evaluated by eval-kl and to the top of the pre-rendered kernel. This way, they only get fetched once and are enclosed over so each time a function is called that depends on one of these, it only needs to access a local variable in scope.

Global functions and global symbol values with the same name are both attached to the same lookup Cells and those Cells get assigned to escaped local variables with $c appended to the end. Idle symbols get assigned to escaped local variables with $s appended to the end.

Fabrications

Just as a Context carries information downward while building an AST, a Fabrication carries resulting context back upward. A fabrication contains the subsection of the AST that was built, along with the results of decisions that were made down in that tree so it doesn’t have to be scanned again.

Fabrications are useful because they are easily composible. There is a composition function for fabrications called assemble which takes a function to combine ASTs and a list of fabrications as arguments. The combining function gets us a single AST and the rest of the metadata being carried by the fabrications has a known means of combination. The result is a single AST and a single body of metadata out of which a single fabrication is made.

At the moment, the only additional information fabrications carry is a substitution map of variable names and the expressions that they will need to be initialized with. The substitution map is used to “hoist” global references to the top of the scope being constructed. When an AST is constructed that depends on one of these substitutions, it refers to a variable by the name specified as a key in the map.

Escaping Special Variable Names

There are still some variables that need to be accessible to generated code, but not to the source code - referenceable in JavaScript, but not Shen. The main example is the environment object, conventionally named $. Dollar signs and other special characters get escaped by replacing them with a dollar sign followed by the two-digit hex code for that character. Since dollar signs are valid identifier characters in JavaScript, hidden environment variables can be named ending with dollar sign, because if a Shen variable ends with $, the escaped JavaScript name will have a trailing $24 instead.

Dynamic Type-Checking

Many JavaScript operators, like the + operator, are not limited to working with specific types like they are in Shen. In JavaScript, + can do numberic addition, string concatenation and offers a variety of strange behaviors are argument types are mixed. In Shen, the + function only works on numbers and passing a non-number is an error.

So in order to make sure these primitive functions are being applied to the correct types, there are a series of is functions like isNumber, isString and isCons, which determine if a value is of that type. There are also a series of as functions for the same types which check if the argument is of that type and returns it if it is, but raises an error if it is not.

This allows concise definition of primitive functions. The + primitve is defined like this:

(x, y) => asNumber(x) + asNumber(y)

and the cn primitive is defined like this:

(x, y) => asString(x) + asString(y)

Code Inlining and Optimisation

To reduce the volume of generated code, and to improve performance, most primitive operations are inlined when fully applied. Since the type checks described in the previous section are still necessary, they get inlined as well, but can be excluded on certain circumstances. For instance, when generating code for the expression (+ 1 X), it is certain that the argument expression 1 is of a numeric type as it is a numeric constant. So instead of generating the code asNumber(1) + asNumber(X), we can just render 1 + asNumber(X).

The transpiler does this simple type inference following a few rules:

  • Literal numeric, string and idle symbol values are inferred to be of those types.

    • 123 is Number.

    • “hello” is String.

    • thing is Symbol.

  • The value returned by a primitive function is inferred to be of that function’s known return type, regardless of the type of the arguments. If the arguments are of an unexpected type, an error will be raised anyway.

    • (+ X Y) is Number.

    • (tlstr X) is String.

    • (cons X Y) is Cons.

  • Local variables in let bindings are inferred to be of the type their bound value was inferred to be.

    • The X in (+ X Y) in (let X 1 (+ X Y)) is Number. X would not need an asNumber cast in (+ X Y).

  • The parameter to a lambda expression used as an error handler in a trap-error form is inferred to be Error.

    • (trap-error (whatever) (/. E (error-to-string E))) does not generate an asError check for E.

More sophisticated analysis could be done, but with dimishing returns in the number of cases it actually catches. And consider that user-defined functions can be re-defined, either in a REPL session or in code loaded from a file, meaning assumptions made by optimised functions could be invalidated. When a function was re-defined, all dependent functions would have to be re-visited and potentially all functions dependent on those functions. That’s why these return type assumptions are only made for primitives.

Pervasive Asynchronocity

I/O functions, including primitives like read-byte and write-byte are idiomatically synchronous in Shen. This poses a problem when porting Shen to a platform like JavaScript which pervasively uses asynchronous I/O.

Functions like read-byte and open are especially problematic, because where write-byte and close can be fire-and-forget even if they do need to be async, Shen code will expect read-byte and open to be blocking and the kernel isn’t designed to await a promise or even pass a continuation.

In order to make the translation process fairly direct, generated JavaScript uses async/await syntax so that code structured like synchronous code can actually be asynchronous. This allows use of async functions to look just like the use of sync functions in Shen, but be realized however necessary in the host language.

async code generation is controlled by a flag on the Context object that the compiler disables for functions that it is sure can be synchronous.

Future Design Options

Some designs are still up in the air. They either solve remaining problems like performance or provide additional capabilites.

Polychronous Functions

Currently, it is difficult to be sure an execution path will be entirely synchronous so the compiler plays it safe and only renders functions as sync when it is sure that function and its referents are sync.

Instead, the compiler could render both sync and async versions of most functions and choose which to call on a case-by-case basis. And when a referent is re-defined from sync to async or vice-versa, the environment could quickly switch the referring function from preferring one mode or the other.

For example, if we have Shen code like (map F Xs) and F is known to be sync, we can call the sync version of map which is tail-recursive or is a simple for-loop by way of a pinhole optimisation. This way, we won’t have to evaluate the long chain of promises and trampolines the async version would result in for any list of decent length.

The compiler would have to keep track of additional information like which functions are always sync, which are always async and which can be one or the other and based on what criteria.

KLambda-Expression Interpreter

Some JavaScript environments will have a Content Security Policy enabled that forbids the use of eval and Function. This would completely break the current design of the ShenScript evaluator. The transpiler would continue to work, and could produce JavaScript ASTs, but they could not be evaluated.

A scratch ESTree interpreter could be written, but as it might need to support most of the capabilities of JavaScript itself, it would easier to write an interpreter that acts on the incoming KLambda expression trees themselves and forgo the transpiler entirely.

The obvious downside is that the interpreter would be much slower that generated JavaScript which would enjoy all the optimisations built into a modern runtime like V8. The interpreter would only be used when it was absolutely necessary.

Expression Type Tracking

Right now, the compiler only tracks the known types of local variables and nested expressions. It could also retain information like (tl X) is a cons and not just that X is a cons and then have to check again later when evaluating (hd (tl X)). This would remove plenty of asCons calls that are not strictly necessary.

Historical and Abandoned Design

String Concatenation

In a much earlier version, code generation was done with JavaScript template strings and string concatenation. This was replaced with the use of the astring library since it is cleaner, more reliable and more flexible to have an AST that can undergo further manipulation as opposed to a final code string that can only be concatenated.

Using Fabrications for Statement-oriented Syntax

Shen’s code style is very expression-oriented and is most easily translated to another expression-oriented language. Most of Shen’s forms translate directly to expression syntax in JavaScript without a problem. All function calls are expressions in JavaScript, conditions can use the ? : ternary operator, etc. Two forms in particular pose a problem: let and trap-error would most naturally be represented with statements in JavaScript.

We could just emit VariableDeclaration and TryStatement AST nodes for these forms, but that causes a compilication when either a statement or an expression might be emitted by the transpiler at any point in the code. And while it’s easy to embed an expression in a statement construct - just by wrapping in an ExpressionStatement - it’s harder to embed a statement in an expression.

An expression like (+ 1 (trap-error (whatever) (/. _ 0))) would normally be rendered like 1 + asNumber(trap(() => whatever(), _ => 0)). How would it be rendered if we wanted to inline a try-catch instead of using the trap function?

The concept of a Fabrication (aka “fabr”) was introduced to represent the composition of the two forms of syntax. Whenever a child form is built, it would return a fabr, consisting of a list of prerequiste statements and a resulting expression. Fabrs are typically composed by making a new fabr with all the statements from the first fabr, followed by the statements from the second fabr and then the result expressions are combined as they would be in the current design.

Since every fabr needs a result expression, for statement syntax, an additional variable declaration is added for a result variable and the result of the fabr is just the identifier expression for that variable.

An example like (+ (let X 3 (* X 2)) (trap-error (whatever) (/. _ 0))) would get rendered like this. The (let X 3 (* X 2)) expression becomes:

{
  "statements": [
    << const X = 3; >>
  ],
  "result": << X * 2 >>
}

The (trap-error (whatever) (/. _ 0)) becomes:

{
  "statements": [
    << let R123$; >>,
    <<
      try {
        R123$ = settle(whatever());
      } catch (e$) {
        R123$ = 0;
      }
    >>
  ],
  "result": << R123$ >>
}

And composed together, they are:

{
  "statements": [
    << const X = 3; >>
    << let R123$; >>,
    <<
      try {
        R123$ = settle(whatever());
      } catch (e$) {
        R123$ = 0;
      }
    >>
  ],
  "result": << (X * 2) + asNumber(R123$) >>
}

This whole approach was attempted on the premise that using more idiomatic JavaScript syntax would give the runtime more opportunities to identify optimisations vs using trap and immediately-invoked lambdas. Turns out using fabrs produced about twice the code volume and benchmarks took 3-4 times as long to run. I guess V8 is really good at optimising IIFEs. So fabrs were reverted. The design is documented here for historical reasons.

This approach could be brought back in order to better handle trap-error at the root of a lambda, avoiding an additional iife.

Another possibility is this would allow the introduction of js.while, js.for, etc. types of special forms where Shen code could directly invoke imperative syntax.