This document outlines the core architectural principles of the Sysclone universal execution engine and details the specific implementation choices made for its first target: the QBasic/MS-DOS environment.
Sysclone is designed as a language-agnostic virtual machine. The architecture is strictly split into three decoupled layers:
- Front-end (Language Specific): Monadic parsers that transform raw source code into a standardized Abstract Syntax Tree (AST), and Tokenizers for UI syntax highlighting.
- Middle-end (Virtual CPU): An asynchronous "Tree-walking" evaluator that treats the AST as a stream of instructions, clocked to prevent browser UI freezing.
- Back-end (Hardware Abstraction Layer): A set of virtualized hardware components (VGA, RAM, I/O, Audio) that provide the necessary environment for legacy software to run without modifications.
This section details the universal mechanics that power Sysclone, regardless of the language being executed.
Rather than using heavy parser generators (ANTLR, PEG.js) or unreadable regular expressions, we opted for custom-built monadic parser combinators (monad.js).
- Why? This allows us to build a modular, composable, and unit-testable grammar (brick by brick) entirely in pure mathematical functions. It natively supports dynamic keyword dictionaries and lazy evaluation for recursive grammar trees.
To provide real-time syntax highlighting without coupling the Web IDE to a specific language, the engine strictly separates AST compilation from visual presentation.
- Architectural Choice: We implemented a Strategy Pattern via a
BaseTokenizerabstract class. The UI relies entirely on a universal semantic contract (TokenTypeslikeKEYWORD,BUILTIN,NUMBER), completely ignoring how the underlying text is parsed. - Declarative Mapping (DRY): Language-specific implementations simply inject the compiler's pure monadic lexers into an array-based pipeline. This guarantees zero-dependency, lightning-fast syntax highlighting that evolves automatically alongside the language grammar.
The challenge on the Web is avoiding main thread blocking during infinite loops or hardware pauses.
- Architectural Choice: We use JavaScript Generators (
function*andyield*) instead of Promises (async/await). - Why?
async/awaitfloods the browser's Microtask Queue, destroying the framerate and overloading the Garbage Collector. Generators create an ultra-fast synchronous iterator where eachyieldacts as a clock "Tick". The browser controls the cycles allocated per frame. - Hardware Interrupts: When the CPU hits a blocking instruction, the generator yields a
SYS_DELAY. The Orchestrator safely exits the execution loop, allows the browser to render the frame, and schedules the resumption of the Virtual CPU seamlessly.
The system interacts with virtualized hardware via strict Dependency Injection. Memory, Video, and I/O controllers act as headless modules, meaning the entire Sysclone engine could run in a Node.js terminal or a Web Worker without any DOM dependencies.
The following architectural choices were made to ensure a "purist" emulation of the QBasic 4.5 / DOS 5.0 environment, scaling from basic text mode to advanced graphics.
QBasic does not use semicolons. Line breaks \n are semantic separators. The parser uses an optimized eos (End Of Statement) token to cleanly consume line breaks, comments ('), and colons (:), actively preventing Catastrophic Backtracking (ReDoS) on large legacy files.
- 3-Tier Flat Scoping: Managed by a hierarchical chain of environments: Shared (Global Root), Main (Program execution), and Local (Subroutines). The engine strictly enforces MS-DOS Flat Scoping, actively blocking JavaScript's native lexical traversal.
- Static Vault: Subroutines utilize a decoupled
persistentVarsmap (STATIC) to retain state across calls without leaking into the global tier. - Memory Aliasing & Variable Vaults: MS-DOS allows accessing a typed variable (e.g.,
DIM Name AS STRING) with or without its type suffix (Name$). To prevent memory desynchronization, Sysclone internalizes aVariableVaultclass. This smart map intercepts allDIM,STATIC, andSHAREDdeclarations to register an internal alias. The Virtual CPU traverses the 3-Tier scope strictly using base names, while the vault seamlessly routes reads and writes to the correct suffixed memory slot, preserving strict MS-DOS aliasing behavior without polluting the engine's core logic. - Memory Classes & In-Place Mutation: * The
QArrayclass flattens multi-dimensional bounds into a pure 1D mathematical array.- The
QFixedStringclass implements locked-memory block allocations (STRING * N), explicitly designed for O(1) in-place mutations to shield the Garbage Collector.
- The
- Duck-Typed Deep Cloning: User-Defined Types (UDTs) utilize an internal deep cloning mechanism during
ASSIGNandSWAPinstructions to ensure purist value-type behavior.
Old DOS games use POKE to write directly into system RAM.
- Architectural Choice: We implemented a true 1 MB virtual RAM stick (
memory.js) addressable via Segment:Offset. - Memory-Mapped I/O: The architecture intercepts writes to vital addresses (e.g.,
&H41Afor the keyboard buffer) and drives the virtual peripherals accordingly.
- Smart Heuristic Decoder: Sysclone automatically detects whether a loaded file is Raw DOS binary (CP437) or corrupted GitHub Mojibake (Windows-1252), repairing the latter on the fly.
- Architectural Boundary: The AST and Virtual CPU operate entirely in standard JS Unicode. The strict translation to 8-bit CP437 bytes occurs only at the Hardware boundary (VGA VRAM), ensuring absolute block character fidelity.
- Hardware Typography Engine: To emulate the VGA text mode (80x25), standard HTML canvas
fillTextwas rejected to avoid anti-aliasing artifacts. Rendering uses bitwise operators to push directly to a rawImageDatabuffer. - Multi-Mode Architecture: The video routing uses a Strategy pattern delegating to a Template Method hierarchy (
TextModeDriverandGraphicsModeDriver). - Algorithmic Purity: The
PAINTcommand utilizes a flat, stack-based Flood Fill algorithm to prevent Call Stack Overflow. TheCIRCLEcommand employs a purist 4-Connected Midpoint Algorithm, eradicating Moire pattern gaps. - Pixel Aspect Ratio (PAR): The VGA router dynamically calculates the CSS canvas scaling against a fixed UI baseline, guaranteeing perfectly round circles regardless of the active QBasic
SCREENmode (e.g., Mode 9 at 640x350). - Coordinate Projection: The engine faithfully implements Cartesian (
WINDOW) vs Screen (WINDOW SCREEN) inversion logic and the MS-DOS auto-sorting quirk for bounding boxes.
- Anti-Ghosting Keyboard: Implements an authentic 16-byte FIFO BIOS buffer that strictly filters browser
e.repeatevents, preventing fast-paced games from consuming trailing inputs. - PC Speaker: Maps
SOUNDandPLAYcommands to the native Web Audio API via a scheduled macro-language parser.
To maintain a purist, backend-free architecture, the WebVM uses the browser's window.location.hash for the application state. Changing the hash automatically triggers the Virtual CPU to halt, flush VRAM, and load the corresponding interpreted program, enabling instant stateless sharing.
Screen capturing and GIF recording (at 24 FPS to catch MS-DOS stroboscopic XOR sprites) are offloaded to a dedicated Web Worker. This ensures the heavy LZW compression algorithms do not block the Virtual CPU's requestAnimationFrame loop.
To guarantee absolute historical fidelity and prevent AI Agent hallucinations during development, Sysclone employs a "Truth Vectors" pipeline:
- JSON Vectors: Atomic definitions of legacy behaviors (inputs, outputs, memory states).
- Code Generation: A custom tool (
build_truth.js) parses these vectors to simultaneously generate:- A native QBasic auto-test framework (
compat.bas) to verify behavior on real MS-DOS hardware. - The JavaScript integration test suites for the JIT engine.
- The
QBASIC_REFERENCE.MDdocumentation.
- A native QBasic auto-test framework (
To support fundamentally different languages (e.g., QBasic's flat scoping vs. Pascal's lexical scoping) without creating a monolithic, unmaintainable core, the engine is strictly decoupled. However, early attempts to create a fully agnostic VirtualCPU paired with the Visitor Pattern were rejected in favor of specialized Evaluators.
Initially, the plan was to extract the syntax rules into a Visitor and leave the Control Flow (Jumps, Ticks, Interrupts) in an agnostic VirtualCPU. This failed because MS-DOS QBasic uses highly irregular control flows (like GOTO breaking out of blocks, or RETURN bubbling up from GOSUB), whereas languages like Turbo Pascal use strict procedural lexical scoping.
- Architectural Choice: We employ specialized, monolithic evaluators (
QBasicEvaluatorand eventuallyPascalEvaluator). - Why? It ensures that the execution loop natively understands the unique control flow quirks of its language. It protects the integrity of chronological memory pointers (like
DATAblocks) without forcing unnatural abstractions onto other languages.
QBasic and Pascal have radically different memory management philosophies. We use inheritance to isolate these paradigms:
BaseEnvironment: The abstract foundation. It holds a basic symbol table, the universalDATAbank, and the critical reference to the Hardware Abstraction Layer'sMemory(the 1MB virtual RAM stick).QBasicEnvironment: Inherits from Base. Strictly implements:- 3-Tier Flat Scoping: Forcing Global/Main/Local isolation and actively preventing JS lexical traversal.
- The VariableVault: Handling the implicit memory aliasing between explicitly typed variables (e.g.,
DIM A AS STRING) and their suffixed counterparts (A$). - Implicit Typing: Handling the
DEFINT A-Zfirst-letter routing rules.
PascalEnvironment: Inherits from Base. Strictly implements:- Lexical Scoping: Native hierarchical traversal (
this.parent) to support nested procedures. - The Heap Allocator: A memory manager that maps Turbo Pascal typed pointers (
^) directly to physical addresses within the HAL'sUint8ArrayRAM.
- Lexical Scoping: Native hierarchical traversal (
The WebVM Orchestrator (webvm.js) acts as the dependency injector. Upon loading a file, it checks the extension (.bas vs .pas) and dynamically assembles the execution pipeline:
- Frontend: Injects the correct UI
Tokenizer(QBasicTokenizervsPascalTokenizer). - State: Instantiates the correct environment (
QBasicEnvironmentvsPascalEnvironment). - Middle-end: Instantiates the corresponding specialized evaluator (
QBasicEvaluator).