import Info from "@/components/docs/Info.astro" import Code from "@/components/Code.astro";
This page (and following pages in the future) define the language.
THP is a strong, statically typed programming language that is transpiled to PHP. It is designed to improve on PHP's shortcomings, mainly a better type system, better syntax and semantics, and better integration with tooling.
The compiler will have 5 phases:
Source code must be UTF-8 encoded. Any non-ASCII bytes appearing within a string literal in source code carry their UTF-8 meaning into the content of the string, the bytes are not modified by the compiler.
Furthermore, THP only recognizes LF as line terminator. Using CRLF will lead to a compiler error.
This document uses a modified version of EBNF which allows the use of RegExp-like modifiers. An example is as follows:
; single line comments
literal = "a"
; ranges iterate over ASCII codepoints
range = "0".."9"
production_1 = character
concatenation = production_1, production_2
alternation = "a" | "b"
alternation_2 = "abc"
| "jkl"
| "xyz"
grouping = ("123", "456")
zero_or_one = production?
zero_or_more = production*
one_or_more = production+
Altough not yet implemented, THP will not use semicolons as statements delimitors. Instead, new lines will serve as statement delimitors.
THP is whitespace insensitive. However, THP has special rules when handling statement termination in order to not use semicolons.
Certain statements have clearly defined markers of termination.
For example, an if statement always has braces {}, so
the closing brace } is the terminator. The same with
parenthesis, square brackets, etc.
Other statements require a explicit terminator. For example, the assignment statement:
<Code thpcode={` val computation = 123 + 456 // how to detect if the statement ends here
In other languages a semicolon would be used to signal the end of the statement:
int computation = 123 + 456
* 789;
THP does not use semicolons. Instead, THP has 1 strict rule and 1 exception to the rule:
No matter the indentation, whitespace or others, every statement ends with a newline.
<Code
thpcode={val compute = 1 + 2 * 3 / 4 // statement ends here ↑}
/>
As mentioned before, this does not affect statements that have clear delimiters. For example, the following code will work as expected:
<Code
thpcode={val compute = my_function( param1, param2, ) / 64 // ↑ statement ends here}
/>
In a way, the parenthesis will "disable" the rule.
But how to have an statement span multiple lines?
If the next line begins with any operator, the statement of the previous line continues.
For example:
<Code thpcode={` val computation = 123 + 456
This is so no matter the indentation:
<Code thpcode={` // weird indentation:
val computation = 123 + 456
What is important is that an operator begins the new line. If the operator is left on the previous line, this will not work:
<Code
thpcode={// statement ends here ↓, and now there is a syntax error (dangling operator) val computation = 123 + 456 * 789 // ↑ this is a different statement}
/>
For this the parser must do look-ahead of 1 token. This is the only place the parser does so.
newline = "\n"
character = '\0'..'\255' ; any ASCII character
lowercase_letter = "a".."z"
uppercase_letter = "A".."Z"
underscore = "_"
dot = "."
comma = ","
decimal_digit = "0".."9"
binary_digit = "0" | "1"
octal_digit = "0".."7"
hex_digit = "0".."9" | "a".."f" | "A".."F"
operator_char = "+" | "-" | "=" | "*" | "!" | "/" | "|"
| "@" | "#" | "$" | "~" | "%" | "&" | "?"
| "<" | ">" | "^" | "." | ":"
This is a summary of all tokens:
pub const TokenType = enum {
Int,
Float,
Identifier,
Datatype,
Operator,
Comment,
String,
// grouping signs
LeftParen,
RightParen,
LeftBracket,
RightBracket,
LeftBrace,
RightBrace,
// punctiation that carries special meaning
Comma,
Newline,
// Each keyword will have its own token
};
A decimal integer cannot have a leading zero. This: 0644 is
a lexic error. Floating point numbers, however, can have leading zeros:
0.6782e+2.
In PHP an integer with a leading zero is not a decimal number, it's
an octal number. So in PHP 0644 === 420. To avoid any confusion,
decimal numbers cannot have a leading zero. Instead, all octal
numbers must begin with either 0o or 0O.
Number = Int | Float
Int = hexadecimal_number
| octal_number
| binary_number
| decimal_number
hexadecimal_number = "0", ("x" | "X"), hexadecimal_digit+
octal_number = "0", ("o" | "O"), octal_digit+
binary_number = "0", ("b" | "B"), binary_digit+
decimal_number = "1".."9", decimal_digit*
Float = decimal_digit+, ".", decimal_digit+, scientific_notation?
| decimal_digit+, scientific_notation
scientific_notation = "e", ("+" | "-"), decimal_digit+
Identifier = (underscore | lowercase_letter), identifier_letter*
identifier_letter = underscore | lowercase_letter | uppercase_letter | decimal_digit
Datatype = uppercase_letter, indentifier_letter*
If 2 or more operator chars are together, they count as a single operator. That is,
+- always becomes a single token, not 2 + - tokens. The lexer is not aware of
any operator.
Operator = operator_char+
At this time, only single line comments are allowed.
Strings in THP only use double quotes.
As of the writing of this page, an escape character is a backslash followed by any byte, except newline.
String = double_quote, (escape_seq | string_char)*, double_quote
escape_seq = backslash, any_except_newline
double_quote = '"'
string_char = any_except_newline_and_double_quote
Each grouping sign has its own token.
LeftParen = "("
RightParen = ")"
LeftBracket = "["
RightBracket = "]"
LeftBrace = "{"
RightBrace = "}"
On this section of the grammar plain strings are used instead of keywords productions.
Each THP source file is a module.
Module = Statement*
For now there is only 1 type of statement.
Statement = VariableBinding
Variable bindings have 2 forms: immutable & mutable.
Immutable bindings use the val keyword, mutable bindings
use var.
Bindings can have type annotations, placed between the keyword and the identifier.
If the binding is immutable and has a datatype, the val keyword
can be dropped. Mutable bindings cannot drop the var keyword.
VariableBinding = ImmutableBinding | MutableBinding
ImmutableBinding = "val", Datatype?, Identifier, "=", Expression
| Datatype, Identifier, "=", Expression
MutableBinding = "var", Datatype?, Identifier, "=", Expression
For now, the only expression recognized is a number.
Expression = Number