Intro
In this series I am creating a transpiler from Python to Golang called Pogo. In the last post we created a great bulk of the lexer, and we're now starting to get ready for the parser.
This Post
The parser turns our slice of tokens into an abstract syntax tree. Each node of the tree will be one token, but our tokens don't have the capability to be part of a tree, because they don't have any concept of children. In this case, we will make a new type of upgraded token, and we will call it a Structure
, as it will define the structure of our program.
The Structure Struct
type Structure struct {
children []Structure
code int
text string
}
This may seem very similar to the Token
struct, if you remember it from one of the previous posts. Another note to remember is that the code
property of the Structure
will not be the same as a Token
's code
. This is because there will be more types of Structure
s. For example, there will not only be a Structure
for the keyword if
, but also a Structure
for an entire if
statement. Furthermore, there will be another Structure
for an if
-else
block, which will hold the structure of an if
, as many else if
s as are laid out, and the final else
. In this way we are trying to separate and group sections of our code into nice chunks that are easy and simple to use.
For debugging, I have created a method that can turn a Structure
into a string in a nice readable way. This method is recursive, calling all of the Structure
s children
.
func (st Structure) stringify() string {
text := ""
for i := 0; i < len(st.children); i++ {
text += "\n" + st.children[i].stringify()
}
return st.text + strings.ReplaceAll(text, "\n", "\n\t")
}
In this method we add a new line for every child, and using strings.ReplaceAll
we also indent every child. This means that when we have a child of a child, they will be shown with 2 indents in the terminal.
Now we are going to make the same sort of map we made for the tokens.
var structureCode map[string]int = map[string]int{
// Not implemented
"ILLEGAL": -1,
// Statements
"ST_IMPORT": 0,
"IF_ELSE_BLOCK": 1,
"ST_IF": 2,
"ST_ELIF": 3,
"ST_ELSE": 4,
"ST_FOR": 5,
"ST_DECLARATION": 6,
"ST_MANIPULATION": 7,
// Other
"BLOCK": 32,
"CALL": 33,
"EXPRESSION": 34,
"COMPARISON": 35,
"PROGRAM": 36,
"ASTERISK": 37,
"IDENTIFIER": 38,
"NEWLINE": 39,
"INDENT": 40,
"L_PAREN": 41,
"R_PAREN": 42,
"L_BLOCK": 43,
"R_BLOCK": 44,
"L_SQUIRLY": 45,
"R_SQUIRLY": 46,
"SEP": 47,
"COLON": 48,
"ANTI_COLON": 49,
"ASSIGN": 50,
"UNDETERMINED": 51,
"COMMENT_ONE": 52,
// Keywords
"K_IMPORT": 64,
"K_FROM": 65,
"K_FOR": 66,
"K_IN": 67,
"K_IF": 68,
"K_ELIF": 69,
"K_ELSE": 70,
// In-built functions
"IB_PRINT": 96,
"IB_RANGE": 97,
// Bool operands
"BO_NOT": 128,
"BO_AND": 129,
"BO_OR": 130,
// Math operands
"MO_PLUS": 160,
"MO_SUB": 161,
"MO_MUL": 162,
"MO_DIV": 163,
"MO_MODULO": 164,
// Literal
"L_BOOL": 192,
"L_INT": 193,
"L_STRING": 194,
// Comparison operands
"CO_EQUALS": 224,
"CO_NOT_EQUALS": 225,
"CO_GT": 226,
"CO_GT_EQUALS": 227,
"CO_LT": 228,
"CO_LT_EQUALS": 229,
}
This may look very familiar, but with the slight difference of the statement section.
Why?
Token
s and Structure
s are very similar, so why didn't I just use one?
Firstly, it separates the 2 processes of the lexer and the parser very well
Secondly, I didn't want to have to give each Token
children
. This is inefficient both space-wise and takes up more space when initializing these Tokens
s in the compiler's source code.
Next
Now that we have created Structure
s, we can start work on our parser. The parser will turn our linear form of Token
s into an abstract syntax tree. This is the last step we will do before we start emitting code, which creates the final Go code as output, completing the project in the most minimal way.