Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ref] Lexing #93

Open
ThakeeNathees opened this issue Jun 17, 2021 · 0 comments
Open

[ref] Lexing #93

ThakeeNathees opened this issue Jun 17, 2021 · 0 comments
Labels
reference An issue thread to learn and use it as a reference

Comments

@ThakeeNathees
Copy link
Owner

The pocketlang compiler will read the source (a string) and generate bytecode (an array of bytes) which can be interpreted by the pocket VM.

The compilation is a 2 step process

  • Lexing: Make tokens from the source string (see below)
  • Parsing: Generate parse tree (in multi-pass compilers) or bytecode (in single pass compilers) according to the language grammar (this will be updated in a new ref issue).

Pocketlang is a single-pass compiler, but for the sake of easiness, we'll see lexing and parsing separately, even though we parse immediately after a token is lexed.

Tokens

Tokens are the result of the lexing process. A token has a type (TokenType) and optionally a value. Here is the simplified pocketlang token type and token declaration. (the source reference).

enum TokenType {
  TK_LPARAN, // '('   -- no token value
  TK_RPARAN, // ')'   -- no token value
  TK_NAME,   // foo   -- token value = name of the identifier
  TK_STRING, // "foo" -- token value = the string literal
  ...
};
struct Token {
  TokenType type;
  Var value;
};

Lexing

lexing

The compiler will read the source string (sequence of characters) and make a sequence of corresponding tokens. These tokens then will be used to generate bytecode by the compiler.

The tokens can be classified into

  • separators - ( ) { } [] . .. , ;
  • operators - + - * / = & \| ^
  • literals - 42 false "foo" 0b10110
  • keywords - def import if break
  • names - foo bar print input

Except for keywords, any identifier will be tokenized to name token (TK_NAME) which will be determined by the parser if it's a built-in function or variable or an imported module, etc. The names that aren't defined are semantic errors throws by the parser. The lexer only cares about if it can make a valid token out of it. Here are some lexing errors.

  • Non terminated string x = "foo
  • Invalid literals 0b123456abc
  • And every other places the lexError() function called.

Each different classification of token types will be tokenized by various lexer helper functions (source). Those functions are encapsulated by lexToken(Compiler* compiler) function. (source)

Use this thread to discuss the lexing process of pocketlang

@ThakeeNathees ThakeeNathees added the reference An issue thread to learn and use it as a reference label Jun 17, 2021
@ThakeeNathees ThakeeNathees changed the title [ref] Lexexing [ref] Lexing Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reference An issue thread to learn and use it as a reference
Projects
None yet
Development

No branches or pull requests

1 participant