A tree-walking interpreter for C-Lite, a pedagogical subset of the C programming language.
Built from scratch in Python - lexer, parser, AST, symbol table, and evaluator - with no external runtime dependencies.
Quick Start | Language Reference | Architecture | Testing | Contributing | License
- Overview
- Quick Start
- Language Reference
- Architecture
- Directory Layout
- Usage
- Examples
- Testing
- Design Decisions
- Known Limitations
- Performance
- Contributing
- License
C-Lite is a simplified, statically-typed subset of C designed to demonstrate the full program translation pipeline from raw source text to evaluated output without the complexity of a production C compiler.
The interpreter implements three classical compilation phases:
| Phase | Component | Responsibility |
|---|---|---|
| 1 | Lexer | Tokenizes source text into a stream of classified tokens |
| 2 | Parser | Builds an Abstract Syntax Tree (AST) via recursive-descent parsing |
| 3 | Interpreter | Walks the AST using the Visitor pattern to evaluate the program |
- Zero external runtime dependencies - only the Python 3.8+ standard library
- Three-phase pipeline - lexer, parser, and tree-walking interpreter
- LL(1) recursive-descent parser with formal EBNF grammar
- Visitor-pattern AST evaluation with type-checked symbol table
- Nested block scoping with variable shadowing and lifetime management
- Implicit type coercion following C semantics (
int<->float) - Precise error reporting with line/column source locations
- Interactive REPL with multi-line input, state inspection, and history
- 293 passing tests across 7 test modules covering all interpreter phases
| Requirement | Version |
|---|---|
| Python | >= 3.8 |
| pip | any |
# Clone the repository
git clone https://github.com/ChethiyaB/C-Lite-Interpreter.git
cd C-Lite-Interpreter
# Install development dependencies (pytest)
pip install -r requirements.txtCreate a file hello.clt:
int x;
x = 42;
printf(x);Execute it:
python cli.py hello.clt42
python repl.py>>> int x;
>>> x = 10;
>>> printf(x);
10
>>> if (x > 5) { printf(1); } else { printf(0); }
1
| Type | Description | Literal Examples |
|---|---|---|
int |
Signed integer | 0, 42, 1000 |
float |
IEEE 754 double-precision floating-point | 3.14, .5, 7., 0.0 |
Variables must be declared before use. Declaration and assignment are separate statements:
int x; // declaration
x = 42; // assignment
float pi;
pi = 3.14159;Note: Using a variable before initialization raises a
SemanticError.
| Operator | Description | Example | Result Type |
|---|---|---|---|
+ |
Addition | 3 + 4 |
int or float |
- |
Subtraction | 10 - 3 |
int or float |
* |
Multiplication | 5 * 2 |
int or float |
/ |
Division | 7 / 2 |
int (truncated) or float |
Division semantics:
int / int-> integer (floor) division:7 / 2=3float / intorint / float-> float division:7.0 / 2=3.5- Division by zero raises
SemanticError
| Operator | Description | Returns |
|---|---|---|
> |
Greater than | 1 (true) or 0 (false) |
< |
Less than | 1 (true) or 0 (false) |
== |
Equal to | 1 (true) or 0 (false) |
Relational operators return integer values (1 or 0), consistent with C semantics.
| Operator | Description | Example |
|---|---|---|
+ |
Unary positive | +5 |
- |
Unary negation | -x |
| Operator | Description | Example |
|---|---|---|
= |
Assignment | x = 5; |
Precedence Operators Associativity Description
─────────────────────────────────────────────────────────────
1 ( ) N/A Parentheses (grouping)
2 + - (unary) right Unary positive / negation
3 * / left Multiplicative
4 + - (binary) left Additive
5 > < == left Relational / equality
6 = N/A Assignment (statement-level)
if (condition) {
// then-branch
} else {
// else-branch (optional)
}- Condition truthiness follows C rules:
0and0.0are false; all other values are true. - The
elseclause is optional. - Dangling
elsebinds to the nearestif(standard C behavior).
Evaluates the expression and prints the result to standard output:
printf(42); // prints: 42
printf(3.14); // prints: 3.14
printf(x + y); // prints the computed sum- Integer-valued floats are printed without a decimal point (e.g.,
6.0prints as6).
Single-line comments are supported with //:
int x; // this is a comment
x = 42; // assign value
// this entire line is a commentBlocks ({ }) introduce new lexical scopes. Variables declared inside a block are destroyed when the block exits:
int x;
x = 10;
{
int y; // y exists only inside this block
y = 20;
int x; // shadows the outer x
x = 99;
printf(x); // prints: 99
}
printf(x); // prints: 10 (outer x restored)
// printf(y); // ERROR: y is not accessible hereVariable shadowing is supported: an inner scope may declare a variable with the same name as an outer scope. The inner variable takes precedence within its scope.
C-Lite performs implicit type coercion following C semantics:
| Context | Rule | Example |
|---|---|---|
int var <- float expr |
Truncation (toward zero) | int x; x = 3.14; -> x == 3 |
float var <- int expr |
Promotion | float y; y = 10; -> y == 10.0 |
| Mixed arithmetic | Promote to float |
3 + 2.5 -> 5.5 |
| Mixed comparison | Promote to float |
3 == 3.0 -> 1 (true) |
The complete grammar specification is available at docs/grammar.ebnf. A summary:
program = { declaration | statement } ;
declaration = type_specifier identifier ";" ;
type_specifier = "int" | "float" ;
statement = assignment | if_statement | printf_call | block | ";" ;
assignment = identifier "=" expression ";" ;
if_statement = "if" "(" expression ")" statement [ "else" statement ] ;
block = "{" { declaration | statement } "}" ;
printf_call = "printf" "(" expression ")" ";" ;
expression = relational_expression ;
relational_expression = additive_expression [ ( ">" | "<" | "==" ) additive_expression ] ;
additive_expression = multiplicative_expression { ( "+" | "-" ) multiplicative_expression } ;
multiplicative_expression = primary_expression { ( "*" | "/" ) primary_expression } ;
primary_expression = number | identifier | "(" expression ")" | unary_expression ;
unary_expression = ( "+" | "-" ) primary_expression ;Grammar properties:
- LL(1) compatible -- no left-recursion, disjoint FIRST/FOLLOW sets
- Suitable for recursive-descent parsing without backtracking
- Complete FIRST/FOLLOW set analysis available in
docs/first_follow_sets.md
The interpreter processes source code through three sequential phases, each transforming the input into a progressively higher-level representation:
Module: src/lexer.py
The lexer (scanner) converts raw source text into a stream of Token objects. Each token carries its type, value, and source location (line and column).
Source: "int x = 42;"
Tokens: [
Token(int, "int", 1:1),
Token(identifier, "x", 1:5),
Token(=, "=", 1:7),
Token(integer, 42, 1:9),
Token(;, ";", 1:11),
Token(eof, None, 1:12)
]
Implementation details:
- Hand-written DFA-based scanner (no parser generators)
- Single-pass
O(n)tokenization via character-by-character advancement - Generator-based token emission (
yield) for memory efficiency - Infinite-loop guard to prevent runaway scanning
- Escape sequence support in string literals (
\n,\t,\",\\)
Module: src/parser.py
The parser consumes the token stream and constructs an Abstract Syntax Tree (AST). It uses LL(1) recursive descent - each grammar production is implemented as a method.
Source: "x = 3 + 4 * 5;"
AST:
Assignment
├── name: "x"
└── value: BinaryOp(+)
├── left: NumberLiteral(3)
└── right: BinaryOp(*)
├── left: NumberLiteral(4)
└── right: NumberLiteral(5)
Implementation details:
- One-token lookahead for all parsing decisions
- Left-associative binary operators via iterative loops (not left-recursion)
- Operator precedence enforced through grammar stratification (4 levels)
- Immutable
@dataclass(frozen=True)AST nodes
Module: src/interpreter.py
The interpreter traverses the AST using the Visitor pattern (ASTVisitor base class). Each AST node type has a corresponding visit_* method.
AST Traversal:
visit_program
├── visit_declaration → SymbolTable.declare("x", int)
├── visit_assignment → SymbolTable.update("x", 42)
│ └── visit_number_literal → returns 42
└── visit_printf_call → output.append(42)
└── visit_identifier → SymbolTable.get_value("x") → 42
Implementation details:
- Stack-based
SymbolTablewith nested scope support (enter/exit scope on block boundaries) - C-style truthiness for conditionals (
0/0.0= false, non-zero = true) - Implicit type coercion on assignment and mixed-type arithmetic
- Division-by-zero detection with near-zero float threshold (
1e-10) - All
printf()output collected in a list for testability
The interpreter uses a structured exception hierarchy with precise source locations:
CLiteError (base)
├── LexerError — invalid characters, malformed literals, unclosed strings
├── ParserError — syntax errors, unexpected tokens, unmatched delimiters
└── SemanticError — undefined variables, uninitialized access, division by zero,
duplicate declarations in same scope
All errors include line, column, and a human-readable message:
SemanticError at line 5, column 1: Undefined variable 'y'
ParserError at line 3, column 12: Expected ')' after condition
LexerError at line 1, column 6: Unexpected character '@'
The full AST hierarchy is defined in src/ast.py:
ASTNode (abstract base)
│
├── Program — root node: declarations[] + statements[]
│
├── Declaration — variable declaration: type_specifier identifier ;
├── Assignment — variable assignment: identifier = expression ;
├── IfStatement — conditional: if (expr) stmt [else stmt]
├── Block — scoped block: { declaration | statement }
├── PrintfCall — output: printf(expression) ;
├── EmptyStatement — no-op: ;
│
├── BinaryOp — binary expression: left op right
├── UnaryOp — unary expression: op operand
├── NumberLiteral — numeric constant: int or float
└── Identifier — variable reference: name
All nodes are immutable (@dataclass(frozen=True)) and carry source location (line, column).
Defined in src/token.py:
| Category | Tokens |
|---|---|
| Keywords | int, float, if, else, printf |
| Literals | INT_LITERAL, FLOAT_LITERAL, STRING_LITERAL |
| Identifiers | IDENTIFIER |
| Arithmetic | +, -, *, / |
| Relational | >, <, == |
| Assignment | = |
| Delimiters | (, ), {, }, ; |
| Meta | COMMENT, EOF |
C-Lite-Interpreter/
├── cli.py # Command-line interface entry point
├── repl.py # Interactive REPL entry point
├── requirements.txt # Development dependencies (pytest)
├── LICENSE # MIT License
│
├── src/ # Interpreter source code
│ ├── __init__.py # Public API exports
│ ├── token.py # Token and TokenType definitions
│ ├── lexer.py # Lexical analyzer (scanner)
│ ├── ast.py # AST node definitions + Visitor base class
│ ├── parser.py # Recursive descent parser
│ ├── symbol_table.py # Stack-based symbol table with scoping
│ ├── interpreter.py # Tree-walking evaluator (Visitor pattern)
│ └── errors.py # Exception hierarchy (CLiteError family)
│
├── tests/ # Test suite (293 tests)
│ ├── __init__.py
│ ├── test_lexer.py # Lexer tests (33 tests)
│ ├── test_parser.py # Parser tests (64 tests)
│ ├── test_interpreter.py # Interpreter tests (46 tests)
│ ├── test_ast.py # AST structure tests (57 tests)
│ ├── test_symbol_table.py# Symbol table tests (38 tests)
│ ├── test_cli.py # CLI integration tests (24 tests)
│ └── test_repl.py # REPL integration tests (31 tests)
│
├── examples/ # Example C-Lite programs
│ ├── hello.clt # Basic variable and printf
│ ├── arithmetic.clt # Arithmetic operations
│ ├── control_flow.clt # if-else branching
│ ├── float_operations.clt# Floating-point arithmetic
│ ├── nested_scopes.clt # Nested block scoping
│ └── type_coercion.clt # Implicit type conversion
│
└── docs/ # Design documentation
├── grammar.ebnf # Formal EBNF grammar specification
├── first_follow_sets.md# FIRST/FOLLOW set analysis
├── grammar_validation.md# Grammar validation examples & parse trees
└── design_decisions.md # Architectural rationale (18 decisions)
usage: cli.py [-h] [-e CODE] [-v] [--version] [source_file]
C-Lite Interpreter
positional arguments:
source_file C-Lite source file (.clt extension)
optional arguments:
-h, --help show this help message and exit
-e CODE, --execute CODE
Execute C-Lite code directly from command line
-v, --verbose Show detailed phase information
--version show program's version number and exit
Execute a file:
python cli.py examples/hello.clt
# Output: 42Execute inline code:
python cli.py -e "int x; x = 10; printf(x);"
# Output: 10Verbose mode (shows lexer tokens, AST, and execution details):
python cli.py -v examples/control_flow.cltSupported file extensions: .clt (recommended), .c, .txt
Launch the REPL with python repl.py:
| Command | Description |
|---|---|
:help |
Show help message with language reference |
:vars |
Display all variables in the current symbol table |
:tokens |
Show tokens from last parsed input |
:ast |
Show AST from last parsed input |
:verbose |
Enable verbose mode (show all phases) |
:quiet |
Disable verbose mode |
:clear / :reset |
Clear interpreter state (reset all variables) |
:version |
Show version information |
:exit / :quit |
Exit REPL (also Ctrl+D) |
REPL features:
- Multi-line input -- automatically detects unclosed braces/parentheses and waits for continuation
- State persistence -- variables declared in one command are available in subsequent commands
- Output isolation -- each command shows only its own
printfoutput - Error recovery -- errors do not terminate the session; type the next command to continue
Example session:
>>> int x;
>>> int y;
>>> x = 10;
>>> y = x * 2;
>>> printf(y);
20
>>> if (y > 15) { printf(1); } else { printf(0); }
1
>>> :vars
Variables in scope:
Scope 0:
x (int) = 10
y (int) = 20
>>> :clear
Interpreter state cleared.
The interpreter components can be used as a Python library:
from src.lexer import Lexer
from src.parser import Parser
from src.interpreter import Interpreter
# Tokenize
source = "int x; x = 42; printf(x);"
tokens = list(Lexer(source).tokenize())
# Parse
ast = Parser(tokens).parse()
# Execute
interpreter = Interpreter()
output = interpreter.execute(ast)
print(output) # [42]Accessing the symbol table after execution:
interpreter = Interpreter()
interpreter.execute(ast)
# Inspect variables
symbol = interpreter.symbol_table.lookup("x")
print(symbol.name, symbol.var_type, symbol.value) # x int 42// examples/hello.clt
int x;
x = 42;
printf(x);$ python cli.py examples/hello.clt
42
// examples/type_coercion.clt
int a;
float b;
float result;
a = 10;
b = 3.5;
result = a + b; // int + float → float
printf(result); // Output: 13.5
a = b; // float → int truncation
printf(a); // Output: 3$ python cli.py examples/type_coercion.clt
13.5
3
// examples/control_flow.clt
int x;
x = 5;
if (x > 3) {
printf(x); // Output: 5
} else {
x = 1;
printf(x);
}$ python cli.py examples/control_flow.clt
5
// examples/nested_scopes.clt
int x;
x = 10;
{
int y;
y = 20;
{
int z;
z = x + y; // accesses outer variables
printf(z); // Output: 30
}
}$ python cli.py examples/nested_scopes.clt
30
// examples/float_operations.clt
float x;
float y;
float result;
x = 3.14;
y = 2.86;
result = x + y;
printf(result); // Output: 6
result = x * y;
printf(result); // Output: 8.9804$ python cli.py examples/float_operations.clt
6
8.9804
# Run all tests
python -m pytest
# Run with verbose output
python -m pytest -v
# Run a specific test module
python -m pytest tests/test_lexer.py
# Run a specific test class
python -m pytest tests/test_interpreter.py::TestBooleanSemantics
# Run a single test
python -m pytest tests/test_parser.py::TestASTStructuralValidation::test_full_tree_validation_precedence
# Run with short traceback
python -m pytest --tb=short -q| Module | Tests | Description |
|---|---|---|
test_lexer.py |
33 | Token recognition, source location tracking, numeric/string literals, comments, error handling, boundary/stress tests |
test_parser.py |
64 | Syntax error detection, deep nesting, float edge cases, AST structural validation, recovery behavior, operator precedence, fuzzing, EOF edge cases, performance |
test_interpreter.py |
46 | Boolean semantics, relational evaluation, type coercion, uninitialized variables, nested blocks, expression stress, control flow, printf ordering, division edge cases, variable lifetime, isolation |
test_ast.py |
57 | Expression tree composition, block scope nesting, if-statement traversal, visitor contract, equality comparison, immutability, program node edge cases, location integrity |
test_symbol_table.py |
38 | Type enforcement, variable shadowing, deep nesting (50-200 levels), scope re-entrancy, memory integrity, error messages, performance (10,000 declarations), float precision |
test_cli.py |
24 | File execution, inline code, verbose mode, error reporting, file encoding, version flag, edge cases |
test_repl.py |
31 | Statement execution, multi-line input, state persistence, REPL commands, error recovery, output isolation |
| Total | 293 | All passing |
The test suite systematically covers the following categories:
Lexer Tests:
- Token recognition (keywords, operators, delimiters)
- Source location tracking (line/column across multi-line input)
- Identifier validation (C naming rules, edge cases)
- Numeric literal parsing (integers, floats, edge cases like
.5,3., errors like1.2.3) - String literal parsing (escape sequences, unclosed strings)
- Comment handling (single-line
//, empty comments, comments at EOF) - Error detection (invalid characters with precise location)
- Boundary/stress tests (1000-char identifiers, large integers, repeated operators)
Parser Tests:
- Negative path combinatorics (trailing operators, unclosed parens/braces, double operators)
- Boundary conditions (50-level nesting, 1000-term expression chains)
- Float literal behavior (leading/trailing dot, zero, double-dot errors)
- AST structural validation (full tree topology for complex expressions)
- Error recovery behavior (fail-fast, error location quality)
- Declaration rules (missing identifier/semicolon, interleaving with statements)
- Operator precedence (relational vs. additive vs. multiplicative)
- Fuzzing (random characters, whitespace-only, empty input, semicolons-only)
- EOF edge cases (truncated if/printf/assignment/declaration)
- Performance (parsing time scaling, memory usage)
Interpreter Tests:
- Boolean semantics (C-style truthiness:
0,0.0= false) - Relational/equality evaluation (return
1/0asint) - Mixed-type comparisons (int vs. float coercion)
- Type assignment violations (float->int truncation, int->float promotion)
- Uninitialized variable detection
- State integrity after exceptions
- Deep nested block execution (50-100 levels)
- Large expression stress (100-1000 term chains)
- Sequential control flow integrity
- Printf ordering guarantee
- Division edge cases (int/int, float/int, division by zero)
- Variable lifetime after block exit
- Symbol table isolation between runs
The project documents 18 design decisions in docs/design_decisions.md. Key highlights:
| # | Decision | Choice | Rationale |
|---|---|---|---|
| 1 | Lexer implementation | Hand-written DFA | Educational transparency, explicit error control |
| 2 | EOF handling | Explicit EOF token | Simplifies parser termination, aligns with FOLLOW sets |
| 4 | Parsing strategy | LL(1) recursive descent | Educational value, easy debugging, no backtracking |
| 5 | Precedence encoding | 4-level expression hierarchy | Matches C semantics, correct AST construction |
| 7 | Interpretation pattern | Visitor pattern | Separates traversal from execution, extensible |
| 8 | Symbol table structure | Stack of scope dictionaries | Efficient enter/exit, matches block scoping |
| 9 | Truthiness | C-style (0 = false) | Matches C semantics |
| 10 | Type coercion | Implicit (int<->float) | Matches C semantics for assignment and arithmetic |
| 11 | Uninitialized variables | Raise SemanticError | Prevents undefined behavior (stricter than C) |
| 12 | Integer division | Floor division for int/int | Matches C semantics |
| 13 | Error strategy | Fail-fast (first error only) | Simpler implementation, standard for educational compilers |
| 15 | Testing approach | Test-driven development (TDD) | Robustness, 293 tests across all components |
| Limitation | Description | Mitigation |
|---|---|---|
| Recursion depth | Python's default recursion limit (~1000) can be exceeded by deeply nested expressions (>500 levels) | Increase limit with sys.setrecursionlimit() or restructure deeply nested code |
| No loop constructs | while and for loops are not supported (per language specification) |
Future extension possible |
| No functions | User-defined functions are not supported | Only built-in printf() is available |
| No arrays/pointers | Aggregate types are not supported | Future extension possible |
| Single-error reporting | Parser stops on first syntax error (fail-fast) | Re-run after fixing each error |
| No string variables | String literals are only supported as printf() arguments |
Strings cannot be stored in variables |
| Component | Time Complexity | Space Complexity |
|---|---|---|
| Lexer | O(n) single-pass |
O(n) token stream |
| Parser | O(n) single-pass |
O(n) AST |
| Interpreter | O(n) tree traversal |
O(s) symbol table (s = scope depth) |
| Symbol table lookup | O(d) per lookup (d = nesting depth) |
O(v) (v = total variables) |
Benchmarks (tested on the project test suite):
- 1,000-term expression chain: parses and evaluates in < 15 seconds
- 10,000 variable declarations: symbol table handles in < 2 seconds
- 50-200 levels of nested scopes: no stack overflow, completes in milliseconds
- Full 293-test suite: completes in ~2.5 seconds
Contributions are welcome. Please follow these guidelines:
- Fork the repository and create a feature branch.
- Write tests for any new functionality (maintain 100% test pass rate).
- Follow existing code style -- type hints, docstrings, consistent naming.
- Run the full test suite before submitting:
python -m pytest -v
- Submit a pull request with a clear description of the changes.
git clone https://github.com/ChethiyaB/C-Lite-Interpreter.git
cd C-Lite-Interpreter
pip install -r requirements.txt
python -m pytest # Verify all 293 tests passThis project is licensed under the MIT License. See LICENSE for details.
MIT License
Copyright (c) 2026 Chethiya Bandara
- References:
- Sebesta, R.W. -- Concepts of Programming Languages (12th Edition)
- Scott, M.L. -- Programming Language Pragmatics (4th Edition)
