Let's build a Simple Angular Compiler: Part 1 - boreddev


This is a series that I summarized what I learned when reading Angular's core.
Before I start, please confide in you a little bit. Previously when learning about Angular, I read a lot of blogs, its docs, even made some real projects on the company with Angular. There is one thing I realize is that Angular's basic knowledge, we can practice more can remember, but with advanced knowledge such as: change detection, dynamic component, lifecycle hook ... it is not possible to absorb it, almost always having to see the sample code but not understand. So I decided to dig deeper into Angular's core and put it together into this series


Why do you need to understand how the Angular compile / interpret code?


  • You will better understand Angular, thereby writing more effective programs.
  • The compiler / interprets process applies the general principles in the KHMT. So once you understand how the compiler / interpreter works, you will be able to write or research other languages ​​/ frameworks yourself: ReactJS, ReactNative. Where they all apply interpreter / compiler.
  • To understand and write interpreter / compiler yourself, you need to use a lot of skills, and that's how you can improve those skills and become a better, more experienced software developer.

Ok, first of all need to understand what is the compiler and interpreter?


The purpose of the compiler and interpreter is to translate a source program from a high-level language into another format. I took the example of Angular, the source program received only metadata declaration and some HTML syntax. After the compiler process (eg, Jit), it becomes JavaScript code as shown in the picture


At the end of this series, I will work with you to build that format.
By now you may be wondering what is the difference between a compiler and an interpreter. For the purpose of this series, we agree on the following. If a translator converts a source program into JavaScript code, it is a compiler. If it executes a source program without first converting it to JavaScript, it is an Interpreter. Illustrated as follows:


Hope you understand the concept somewhat. If not clear, you can google more offline. ^ - ^


So what are we building on this series?


We will create a compiler to compile the declaration code AppComponent The following is a simple JavaScript code like the image at the top of the article (aka ComponentFactory )


import { Component } from '@angular/core';
@Component({
selector: 'app-root',
templateUrl: `

Welcome to {{ title }}

`,
styleUrls: ('./app.component.css')
})

export class AppComponent {
title = 'Angular6'
}

The implementation language for this compiler I will choose Python. But you are free to choose the language you want because the idea is that it does not depend on a specific language. Ok, are you ready? Let's go ^^


In Angular, the first compiler process that takes place is to analyze the component's template. As you can see, the current template is just a string of html, it doesn't make any sense, the variable {{title}} is now just a text in the tag.

, it has never acted as binding. Therefore, Angular must return this HTML string to a format that suits the Angular context. To do so, it must first parse that template and return it to a complete HTML tree. Although JavaScript support already exists in HTML, I will rewrite it in Python so that you can imagine the compiler place.


This is the source code for the section lexical analysis. Below I will explain it to you lexical analysis what. But now you keep saving this source code named html_lexer.py and test run.


import html.parser
(TAG_OPEN, ATTR, TAG_CLOSE, TEXT, EOF) = ('TAG_OPEN', 'ATTR', 'TAG_CLOSE', 'TEXT', 'EOF')

class Token(object):
def __init__(self, type, value):
self.type = type
self.value = value

def __str__(self):
"""String representation of the class instance.
Examples:
Token(TAG_OPEN, 'h1')
Token(ATTR, '+')
Token(MUL, '*')
"""
return 'Token({type}, {value})'.format(
type=self.type,
value=repr(self.value)
)

def __repr__(self):
return self.__str__()

class Lexer(html.parser.HTMLParser):
def __init__(self):
# initialize the base class
html.parser.HTMLParser.__init__(self)
self.tokens = ()

def handle_starttag(self, tag, attrs):
print("Start tag:", tag)
self.tokens.append(Token(TAG_OPEN, tag))
for attr in attrs:
print(" attr:", attr)
self.tokens.append(Token(ATTR, attr))

def handle_endtag(self, tag):
self.tokens.append(Token(TAG_CLOSE, tag))
print("End tag :", tag)

def handle_data(self, data):
print("Data :", data)
self.tokens.append(Token(TEXT, data))
def read(self, data):

self.feed(data)
self.tokens.append(Token(EOF, None))
print(self.tokens)
return self

def main():
lexer = Lexer()
html = "

Welcome to {{ title }}

"
lexer.read(html)

if __name__ == '__main__':
main()

Test run the code above:


python html_lexer.py
Start tag: h1
Data : Welcome to {{ title }}
End tag : h1
(Token(TAG_OPEN, 'h1'), Token(TEXT, 'Welcome to {{ title }}'), Token(TAG_CLOSE, 'h1'), Token(EOF, None))

For the above code to run without throwing any exception, your HTML string needs to follow the rules here:

http://www.w3.org/TR/html51/syntax.html#writing

And one more rule is:


  • Here I just make the most minimal, so an opening tag must have a closing tag. Even the card If you put in, you must have a closing tag . It's a bit ridiculous, but you guys just accept that. My purpose is to understand the flow.

Okay, now dive into the code and see what the compiler is doing. In order for the compiler to understand what is inside the HTML string, it first needs to break that string into components called. token. Each token will be an object containing 1 type and 1 value. For example, the string "h1" has a type of TAG_OPEN and the corresponding value is ‘h1’


The process of breaking an input sequence into tokens is called lexical analysis. VSo the first step your compiler needs to do is read the input string and convert it into a stream of tokens. The part of the compiler that does this is called lexical analyzer, or lexer for brevity. You may also have heard of other names for this period such as: scanner good tokenizer. TThey all mean the part of the compiler that reads the input string and converts it into a stream of tokens.


Here, I use python's HTMLParser library to parse the correct HTML syntax for me, once the syntax has been properly parsed, I just need to divide it into tokens. For example:


You can optionally include the HTML string into the code and test run. Just remember that exception rule of mine.

OKAY, I think today's lesson is enough. Remember to carefully review the code to understand, even practice continuously. Also, answer these questions yourself to reinforce today's knowledge:

1, What is interpreter?

2, What is a compiler?


3, What is the difference between compiler and interpreter?


4, What is a token?


5, Name of the break input process into tokens?


6, What is the compiler part doing lexical analysis?


7, What other common names of that compiler part?


The following article I will analyze the process of parse stream tokens obtained from the step above lexer. You may still wonder why we need to care from this HTML parsing step, so I would say it's a basic foundation for you to understand the Compiler works. Later you will see how much Angular has applied the knowledge and rules to create a compiler.


Explore more:


https://ruslanspivak.com/lsbasi-part1/


https://blog.mgechev.com/2017/09/16/developing-simple-interpreter-transpiler-compiler-tutorial/


0 Comments

×