Skip to content

This repository contains a Python implementation of a toy C compiler, that passes all of the tests for Chapter 1 of nlsandler / writing-a-c-compiler-tests. It uses the Visitor Pattern to traverse and process the Abstract Syntax Tree (AST) during the compilation process.

License

Notifications You must be signed in to change notification settings

JamesSullivan/python-visitor-c-compiler

Repository files navigation

Writing a C Compiler Chapter 1: Python Implementation using the Visitor Pattern

This repository contains a Python implementation of a toy C compiler, that passes all of the tests for Chapter 1 of nlsandler / writing-a-c-compiler-tests. It uses the Visitor Pattern to traverse and process the Abstract Syntax Tree (AST) during the compilation process. The compiler is designed as a learning tool to understand the key stages of compilation, including lexical analysis, parsing, assembly ast generation, and emmitting assembly code.


Overview

The compiler is implemented in Python and follows a modular design, with each stage of the compilation process clearly separated. The Visitor Pattern is used extensively to traverse and manipulate the AST, making the code modular, extensible, and easy to understand.

Key Features:

  1. Lexical Analysis: Converts the source code into a stream of tokens.
  2. Parsing: Builds an Abstract Syntax Tree (AST) from the tokens.
  3. assembly ast generation: Generates an assembly AST.
  4. Emmiting code: Emits assmebly code and links it into an executable using GCC.
  5. Visitor Pattern: Used to traverse and process the AST during parsing and code generation.

Compilation Stages

The compiler follows these stages to transform a C source file into an executable:

  1. Preprocessing:

    • The source file is preprocessed using GCC to handle macros and includes.
    • Output: A preprocessed .i file.
  2. Lexical Analysis:

    • The preprocessed file is tokenized into a list of tokens (e.g., keywords, identifiers, operators).
    • Output: A list of Token objects.
  3. Parsing:

    • The tokens are parsed into an Abstract Syntax Tree (AST) using a recursive descent parser.
    • Output: An AST representing the structure of the program.
  4. Assembly AST Generation

    • Generates an assembly AST.
  5. Assembly Code:

    • The AST is traversed using the Visitor Pattern to generate assembly code.
    • Output: Assembly code in a .s file.
  6. Assembly and Linking:

    • The assembly code is converted into machine code and linked into an executable using GCC.
    • Output: An executable file (.exe on Windows or no extension on Unix-like systems).

Visitor Pattern in Action

The Visitor Pattern is used to separate the logic for traversing the AST from the operations performed on it. This makes the code modular and allows for easy extension. For example:

  • ParserPrinter: A visitor that prints the AST in a human-readable format.
  • GeneratorPrinter: A visitor that generates assembly code from the AST.
  • CodeEmitter: A visitor that emits machine code from the assembly AST.

Each visitor implements a visit method for each node type in the AST, allowing for custom behavior at each step.


Requirements

  • Python 3.7 or higher
  • GCC (for preprocessing and linking)
  • Linux or Mac or Windows OS

Usage

  1. Clone the repository:

    git clone https://github.com/https://github.com/JamesSullivan/python-visitor-c-compiler.git
    cd python-visitor-c-compiler
  2. Run the compiler:

    python compiler.py <path-to-source-file.c> [options]

Command-Line Options:

  • --lex: Stop after lexical analysis and print the tokens.
  • --parse: Stop after parsing and printing the AST.
  • --codegen: Stop after generating and printing the assembly AST.
  • -S: Stop after generating the assembly file (.s).

Example:

python compiler.py return_2.c --parse

Example Workflow

  1. Input File (return_2.c):

     int main(void) {
         return 2;
     }
  2. Preprocessing:

    • Generates return_2.i.
  3. Lexical Analysis:

    • Outputs a list of tokens:
      Token(type='INT', value='int', line=1, column=1)
      Token(type='ID', value='main', line=1, column=5)
      Token(type='LPAREN', value='(', line=1, column=9)
      Token(type='VOID', value='void', line=1, column=10)
      Token(type='RPAREN', value=')', line=1, column=14)
      Token(type='LCUB', value='{', line=1, column=16)
      Token(type='NEWLINE', value='\n', line=1, column=17)
      Token(type='RETURN', value='return', line=2, column=2)
      Token(type='CONSTANT', value='2', line=2, column=9)
      Token(type='END', value=';', line=2, column=10)
      Token(type='NEWLINE', value='\n', line=2, column=11)
      Token(type='RCUB', value='}', line=3, column=1)
      Token(type='NEWLINE', value='\n', line=3, column=2)
      
  4. Parsing:

    • Outputs the AST:
      Program
      Function: main
          Return: 2
      
  5. Assembly AST Generation

    • Outputs the Assembly AST:
      Program:
       Function: main
       Mov:
         Source: Immediate(2)
         Destination: Register(eax)
       Ret
      
  6. Code Generation:

    • Outputs assembly code:
      	.global main
      main:
      	movl	$2, %eax
      	ret
      	.section .note.GNU-stack,"",@progbits
  7. Assembly and Linking:

    • Generates an executable file (return2 or return2.exe).

Directory Structure

python-visitor-c-compiler/
├── assembly/             # Assembly AST generation module
│   ├── ast.py
│   ├── generator.py
│   └── generatorprinter.py
├── parser/               # Parsing module to create AST
│   ├── ast.py
│   ├── parser.py
│   └── parserprinter.py
├── codeemitter.py        # Assembly code emission
├── compiler.py           # Main driver program
├── compiler.sh           # Convenience for Linux users
├── lexer.py              # Tokenizes source code
├── LICENSE
├── mistake_lex.c         # Example C source with token error
├── mistake_parse.c       # Example C source with syntax error
├── README.md
└── return_2.c            # Example C valid source file

Contributing

Contributions are welcome! If you find a bug or want to add a feature, please open an issue or submit a pull request.


License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments


Happy compiling! 🚀

About

This repository contains a Python implementation of a toy C compiler, that passes all of the tests for Chapter 1 of nlsandler / writing-a-c-compiler-tests. It uses the Visitor Pattern to traverse and process the Abstract Syntax Tree (AST) during the compilation process.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages