The THP Language Specification

This series of pages define the THP Programming Language.

THP’s grammar is context-dependant.

The syntax is specified using a weird mix of Extended Backus Naur Form and RegExp:

; comments

syntax        = concatenation
concatenation = production1, production2

alternation   = "a" | "b"
              | "c"
grouping      = ("a", "b")

optional      = "a"?
one_or_more   = "a"+
zero_or_more  = "a"*

range         = "1".."9"
literal       = "a"

Compiler architecture

The compiler consists of 5 common phases:

Source Code representation

Source code is encoded in UTF-8, and a single UTF-8 codepoint is a single character.

Basic characters

Although the source code must be encoded in UTF-8, most of the actual source code will use only the basic 128 ASCII characters. String contents may contain any Unicode code point.

underscore    = "_"

decimal_digit = "0".."9"
binary_digit  = "0" | "1"
octal_digit   = "0".."7"
hex_digit     = decimal_digit | "a".."f" | "A".."F"

lowercase_letter = "a".."z"
uppercase_letter = "A".."Z"

Whitespace & Automatic semicolon insertion

This section is being reworked on the Zig rewrite of the compiler.

THP is whitespace insensitive. However, THP has special rules when handling statement termination in order to not use semicolons.

Certain statements have clearly defined markers of termination. For example, an if statement always has braces {}, so the closing brace } is the terminator. The same with parenthesis, square brackets, etc.

Other statements require a explicit terminator. For example, the assignment statement:

val computation = 123 + 456  // how to detect if the statement ends here
* 789                        // or extends up to here?thp

In other languages a semicolon would be used to signal the end of the statement:

int computation = 123 + 456
* 789;

THP does not use semicolons. Instead, THP has 1 strict rule and 1 exception to the rule:

All statements end with a newline

No matter the indentation, whitespace or others, every statement ends with a newline.

val compute = 1 + 2 * 3 / 4
// statement ends here     ↑thp

As mentioned before, this does not affect statements that have clear delimiters. For example, the following code will work as expected:

val compute = my_function(
  param1,
  param2,
) / 64
//    ↑ statement ends herethp

In a way, the parenthesis will “disable” the rule.

But how to have an statement span multiple lines?

Exception: operator on the next line.

If the next line begins with any operator, the statement of the previous line continues.

For example:

val computation = 123 + 456
* 789
//   ↑ statement ends here, and there is a single statementthp

This is so no matter the indentation:

// weird indentation:

  val computation = 123 + 456

- 789
// ↑ statement still ends herethp

What is important is that an operator begins the new line. If the operator is left on the previous line, this will not work:

//       statement ends here ↓, and now there is a syntax error (dangling operator)
val computation = 123 + 456 *
789
// ↑ this is a different statementthp

For this the parser must do look-ahead of 1 token. This is the only place the parser does so.

Old Whitespace rules

THP is partially whitespace sensitive. It uses the following tokens: Indent, Dedent & NewLine to determine when an expression spans multiple lines.

The lexer stores the indentation level of every line, and when scanning the next line, compares the previous indentation to the new one. If the amount of whitespace is greater than before, it emits a Indent token. If it’s lower, emits a Dedent token, and if it’s the same it does nothing.

1 + 2
  + 3
  + 4thp

The previous code would emit the following tokens: 1 + 2 NewLine Indent + 3 NewLine + 4 Dedent

Additionaly, it is a lexical error to have wrong indentation. The lexer stores all previous indentation levels in a stack, and reports an error if a decrease in indentation doesn’t match a previous level.

if true {   // 0 indentation
  print() // 4 indentation
print()   // 2 indentation. Error. There is no 2-indentation level
}thp

All productions of the grammar ignore whitespace/indentation, except those involved in semicolon inference.

Statement termination / Semicolon inference

Only inside a block of code whitespace is used to determine where a statement ends and a new one begins. Everywhere else whitespace is ignored.

Statements in THP end when a new line is encountered:

// The statement ends         | here, on the newline
val value = (123 + 456) * 0.75thp
// Each line contains a different statement. They all end on their new lines

var a = 1 + 2 // a = 3

- 3 // this is not part of `a`, this is a different statementthp

This is true even if the line ends with an operator:

// These are still different statements

var a = 1 + 2 + // This is now a compile error, there is a hanging 
3 // This is still a different statementthp

Parenthesis

Exception 1: When a parenthesis is open, all following whitespace is ignored until the closing parenthesis.

// open parenthesis found, all whitespace is ignored until the closing
name.contains(
"weird"
  )thp

However, for a parenthesis to begin to act, it needs to be open on the same line.

// Still 2 statements, because the parenthesis is in a new line
print
(
  "hello"
)

// Now it's one single statement
print(
"hello"
)thp

Indented binary operator

Exception 2:

val sum = 1 + 2 +   // The line ends with a binary operator
  3               // There is indentationthp
val sum = 1 + 2
  + 3             // Indentation and a binary operatorthp

In theses cases, all whitespace will be ignored until the indentation returns to the initial level.

// This method chain is a single statement because of the indentation
val person = PersonBuilder()
  .set_name("john")
  .set_lastname("doe")
  .set_age(32)
  .set_children(2)
  .build()

// Here indentation returns, and a new statement begins
print(person)thp