Skip to content

Latest commit

 

History

History
389 lines (329 loc) · 19.6 KB

File metadata and controls

389 lines (329 loc) · 19.6 KB

PL/pgSQL Parser Scenarios

Goal: Hand-written PL/pgSQL recursive descent parser matching omni pg/parser architecture with full PostgreSQL PL/pgSQL compatibility Verification: Go unit tests — each scenario is a PL/pgSQL body string that must parse to the correct AST (or produce the correct error) Reference sources: PostgreSQL src/pl/plpgsql/src/pl_gram.y, plpgsql.h, pl_scanner.c, pl_reserved_kwlist.h, pl_unreserved_kwlist.h; PostgreSQL docs Chapter 43

Status: [ ] pending, [x] passing, [~] partial


Phase 1: Foundation

1.1 AST Node Types

  • PLBlock node — cmd_type, label, body (statement list), declarations, exception block
  • PLDeclare node — variable name, type reference, constant flag, not-null flag, collation, default expression
  • PLCursorDecl node — cursor name, scroll option, argument list, query text
  • PLAliasDecl node — alias name, referenced variable name
  • PLIf node — condition text, then-body, elsif list, else-body
  • PLCase node — test expression text, case-when list, else-body, has-test flag
  • PLLoop node — label, body
  • PLWhile node — label, condition text, body
  • PLForI node — label, variable, lower/upper/step expression texts, reverse flag, body
  • PLForS node — label, variable, query text, body
  • PLForC node — label, variable, cursor variable, arg query text, body
  • PLForDynS node — label, variable, query text, params list, body
  • PLForEachA node — label, variable, slice dimension, array expression text, body
  • PLReturn node — expression text (optional)
  • PLReturnNext node — expression text
  • PLReturnQuery node — query text (static), dynamic query text, params list
  • PLAssign node — variable target, expression text
  • PLExecSQL node — SQL text, into-target, strict flag
  • PLDynExecute node — query text, into-target, strict flag, params list
  • PLPerform node — expression text
  • PLCall node — SQL text (CALL or DO)
  • PLRaise node — level, condition name, sqlstate, message text, params, options list
  • PLAssert node — condition text, message text
  • PLGetDiag node — is-stacked flag, diagnostic items list
  • PLOpen node — cursor variable, query text, dynamic query text, params, scroll option
  • PLFetch node — cursor variable, direction, direction count, into-target, is-move flag
  • PLClose node — cursor variable
  • PLExit node — is-continue flag, label, condition text
  • PLCommit node — chain flag
  • PLRollback node — chain flag
  • PLNull node (null statement)
  • All nodes implement omni Node interface with Tag() method
  • All nodes carry Loc (start, end byte offsets)

1.2 Keywords and Tokens

  • Reserved keywords: ALL, BEGIN, BY, CASE, DECLARE, ELSE, END, EXECUTE, FOR, FOREACH, FROM, IF, IN, INTO, LOOP, NOT, NULL, OR, STRICT, THEN, TO, USING, WHEN, WHILE
  • Unreserved keywords recognized: ABSOLUTE, ALIAS, AND, ARRAY, ASSERT, BACKWARD, CALL, CHAIN, CLOSE, COLLATE, COLUMN, COLUMN_NAME, COMMIT, CONSTANT, CONSTRAINT, CONSTRAINT_NAME, CONTINUE, CURRENT, CURSOR, DATATYPE, DEBUG, DEFAULT, DETAIL, DIAGNOSTICS, DO, DUMP, ELSIF, ELSEIF, ERRCODE, ERROR, EXCEPTION, EXIT, FETCH, FIRST, FORWARD, GET, HINT, IMPORT, INFO, INSERT, IS, LAST, LOG, MERGE, MESSAGE, MESSAGE_TEXT, MOVE, NEXT, NO, NOTICE, OPEN, OPTION, PERFORM, PG_CONTEXT, PG_DATATYPE_NAME, PG_EXCEPTION_CONTEXT, PG_EXCEPTION_DETAIL, PG_EXCEPTION_HINT, PG_ROUTINE_OID, PRINT_STRICT_PARAMS, PRIOR, QUERY, RAISE, RELATIVE, RETURN, RETURNED_SQLSTATE, REVERSE, ROLLBACK, ROW_COUNT, ROWTYPE, SCHEMA, SCHEMA_NAME, SCROLL, SLICE, SQLSTATE, STACKED, TABLE, TABLE_NAME, TYPE, USE_COLUMN, USE_VARIABLE, VARIABLE_CONFLICT, WARNING
  • Unreserved keywords usable as variable names in declarations
  • Reserved keywords NOT usable as variable names
  • Keyword lookup is case-insensitive

1.3 Compiler Options

  • #option dump directive before block
  • #print_strict_params on directive
  • #print_strict_params off directive
  • #variable_conflict error directive
  • #variable_conflict use_variable directive
  • #variable_conflict use_column directive
  • Multiple directives before block
  • Unknown #option produces error

1.4 Parser Scaffold and Minimal Block

  • Entry point accepts PL/pgSQL function body string, returns AST root
  • Empty block: BEGIN END parses to PLBlock with empty body
  • Empty block with semicolons: BEGIN END; parses correctly
  • Block with DECLARE and no variables: DECLARE BEGIN END
  • Block with trailing label: <<lbl>> BEGIN END lbl
  • Label mismatch at END produces error
  • Mismatched END keyword (e.g. END IF in block) produces error
  • Unexpected token at top level produces error with position
  • Missing END produces error
  • Multiple statements in block separated by semicolons
  • Nested block: BEGIN BEGIN END; END
  • Labeled nested block: <<inner>> BEGIN END inner;
  • Label between DECLARE and BEGIN produces error: DECLARE <<lbl>> BEGIN END;
  • Extra DECLARE keyword within section: DECLARE DECLARE x int; BEGIN END parses correctly

1.5 DECLARE Section — Variable Declarations

  • Scalar: DECLARE x integer;
  • With default :=: DECLARE x integer := 0;
  • With default =: DECLARE x integer = 0;
  • With default DEFAULT: DECLARE x integer DEFAULT 0;
  • CONSTANT: DECLARE x CONSTANT integer := 42;
  • NOT NULL: DECLARE x integer NOT NULL := 1;
  • CONSTANT NOT NULL: DECLARE x CONSTANT integer NOT NULL := 1;
  • COLLATE: DECLARE x text COLLATE "en_US";
  • COLLATE with NOT NULL: DECLARE x text COLLATE "C" NOT NULL := 'a';
  • Record type: DECLARE r record;
  • Qualified type: DECLARE x public.my_type;
  • %TYPE reference: DECLARE x my_table.my_column%TYPE;
  • %ROWTYPE reference: DECLARE r my_table%ROWTYPE;
  • Multiple variables: DECLARE x int; y text; z boolean;
  • Cursor declaration: DECLARE c CURSOR FOR SELECT 1;
  • Cursor with SCROLL: DECLARE c SCROLL CURSOR FOR SELECT 1;
  • Cursor with NO SCROLL: DECLARE c NO SCROLL CURSOR FOR SELECT 1;
  • Cursor with parameters: DECLARE c CURSOR (p1 int, p2 text) FOR SELECT p1, p2;
  • Cursor with IS (synonym for FOR): DECLARE c CURSOR IS SELECT 1;
  • Alias declaration: DECLARE a ALIAS FOR $1;
  • Alias for named parameter: DECLARE a ALIAS FOR param_name;
  • Expression default spans until semicolon: DECLARE x int := 1 + 2 * 3;
  • Array type: DECLARE a integer[];
  • Array type with dimension: DECLARE a integer[10];
  • NOT NULL without default produces error: DECLARE x integer NOT NULL;
  • CONSTANT without default produces error: DECLARE x CONSTANT integer;

Phase 2: Control Flow

2.1 IF / ELSIF / ELSE

  • Simple IF-THEN-END IF: IF x > 0 THEN y := 1; END IF;
  • IF-THEN-ELSE: IF x > 0 THEN y := 1; ELSE y := 0; END IF;
  • IF-THEN-ELSIF-ELSE: IF x > 0 THEN y := 1; ELSIF x = 0 THEN y := 0; ELSE y := -1; END IF;
  • Multiple ELSIF branches: three or more ELSIF clauses
  • ELSEIF synonym: IF x THEN y := 1; ELSEIF z THEN y := 2; END IF;
  • Nested IF inside IF: IF a THEN IF b THEN c := 1; END IF; END IF;
  • Condition expression spans until THEN: IF x > 0 AND y < 10 THEN ...
  • Empty THEN body: IF x THEN END IF; — parses with empty body
  • Multiple statements in THEN body
  • Multiple statements in ELSE body
  • Missing THEN produces error
  • Missing END IF produces error

2.2 CASE Statement

  • Searched CASE: CASE WHEN x > 0 THEN y := 1; WHEN x = 0 THEN y := 0; END CASE;
  • Searched CASE with ELSE: CASE WHEN x > 0 THEN y := 1; ELSE y := -1; END CASE;
  • Simple CASE: CASE x WHEN 1 THEN y := 'a'; WHEN 2 THEN y := 'b'; END CASE;
  • Simple CASE with ELSE: CASE x WHEN 1 THEN y := 'a'; ELSE y := 'z'; END CASE;
  • Multiple WHEN branches
  • Multiple statements per WHEN body
  • Nested CASE inside CASE
  • Missing END CASE produces error
  • No WHEN clause produces error

2.3 Basic Loops — LOOP, WHILE

  • Infinite LOOP: LOOP x := x + 1; EXIT WHEN x > 10; END LOOP;
  • Labeled LOOP: <<myloop>> LOOP EXIT myloop; END LOOP myloop;
  • WHILE loop: WHILE x > 0 LOOP x := x - 1; END LOOP;
  • Labeled WHILE: <<w>> WHILE x > 0 LOOP x := x - 1; END LOOP w;
  • Condition expression spans until LOOP keyword
  • Nested loops: LOOP inside LOOP
  • Empty loop body: LOOP END LOOP;
  • Missing END LOOP produces error
  • WHILE missing LOOP keyword produces error

2.4 FOR Loops — All Variants

  • Integer FOR: FOR i IN 1..10 LOOP x := x + i; END LOOP;
  • Integer FOR REVERSE: FOR i IN REVERSE 10..1 LOOP x := x + i; END LOOP;
  • Integer FOR with BY: FOR i IN 1..100 BY 5 LOOP x := x + i; END LOOP;
  • Integer FOR REVERSE with BY: FOR i IN REVERSE 100..1 BY 10 LOOP END LOOP;
  • Integer FOR with expression bounds: FOR i IN a+1..b*2 LOOP END LOOP;
  • Query FOR: FOR rec IN SELECT * FROM t LOOP x := rec.a; END LOOP;
  • Query FOR with complex query: FOR rec IN SELECT a, b FROM t WHERE c > 0 ORDER BY a LOOP END LOOP;
  • Cursor FOR: FOR rec IN cur LOOP END LOOP; (cur is cursor variable)
  • Cursor FOR with arguments: FOR rec IN cur(1, 'a') LOOP END LOOP;
  • Dynamic FOR: FOR rec IN EXECUTE 'SELECT * FROM ' || tbl LOOP END LOOP;
  • Dynamic FOR with USING: FOR rec IN EXECUTE 'SELECT * FROM t WHERE id = $1' USING my_id LOOP END LOOP;
  • FOREACH ARRAY: FOREACH x IN ARRAY arr LOOP END LOOP;
  • FOREACH ARRAY with SLICE: FOREACH x SLICE 1 IN ARRAY arr LOOP END LOOP;
  • Labeled FOR: <<fl>> FOR i IN 1..10 LOOP END LOOP fl;
  • Label mismatch on FOR loop END produces error
  • Nested FOR loops

2.5 EXIT and CONTINUE

  • Bare EXIT: EXIT;
  • EXIT with label: EXIT myloop;
  • EXIT with WHEN: EXIT WHEN x > 10;
  • EXIT with label and WHEN: EXIT myloop WHEN x > 10;
  • Bare CONTINUE: CONTINUE;
  • CONTINUE with label: CONTINUE myloop;
  • CONTINUE with WHEN: CONTINUE WHEN x = 0;
  • CONTINUE with label and WHEN: CONTINUE myloop WHEN x = 0;
  • WHEN condition expression spans until semicolon

Phase 3: Statements

3.1 Variable Assignment

  • Simple assignment: x := 1;
  • Expression assignment: x := a + b * c;
  • Function call assignment: x := my_func(a, b);
  • Subquery assignment: x := (SELECT max(a) FROM t);
  • Record field assignment: rec.field := 42;
  • Array element assignment: arr[1] := 'hello';
  • Array slice assignment: arr[1:3] := ARRAY[1,2,3];
  • Multi-level field: rec.nested.field := 1;
  • Assignment with complex RHS spanning until semicolon
  • Assignment with = operator (PG also accepts this): x = 1;

3.2 RETURN Variants

  • Simple RETURN: RETURN;
  • RETURN with expression: RETURN x + 1;
  • RETURN with subquery: RETURN (SELECT count(*) FROM t);
  • RETURN NEXT: RETURN NEXT x;
  • RETURN NEXT with record: RETURN NEXT rec;
  • RETURN NEXT bare (for OUT params): RETURN NEXT;
  • RETURN QUERY static: RETURN QUERY SELECT * FROM t;
  • RETURN QUERY with WHERE: RETURN QUERY SELECT a, b FROM t WHERE c > 0;
  • RETURN QUERY EXECUTE: RETURN QUERY EXECUTE 'SELECT * FROM t';
  • RETURN QUERY EXECUTE with USING: RETURN QUERY EXECUTE 'SELECT * FROM t WHERE id = $1' USING my_id;
  • RETURN QUERY EXECUTE with multiple USING: RETURN QUERY EXECUTE $q$ SELECT $1, $2 $q$ USING a, b;
  • Expression text spans until semicolon for RETURN value

3.3 PERFORM and Bare SQL

  • PERFORM: PERFORM my_func(1, 2);
  • PERFORM with query: PERFORM * FROM t WHERE a > 0; (converted from SELECT)
  • Bare SQL — INSERT: INSERT INTO t VALUES (1, 2);
  • Bare SQL — UPDATE: UPDATE t SET a = 1 WHERE b = 2;
  • Bare SQL — DELETE: DELETE FROM t WHERE a = 1;
  • SQL with INTO: SELECT a, b INTO x, y FROM t WHERE id = 1;
  • SQL with INTO STRICT: SELECT a INTO STRICT x FROM t WHERE id = 1;
  • SQL statement text extracted until semicolon
  • SQL keywords (INSERT, UPDATE, DELETE, SELECT, MERGE, IMPORT) recognized as statement starters
  • Bare SQL — IMPORT: IMPORT FOREIGN SCHEMA s FROM SERVER srv INTO public;

3.4 EXECUTE Dynamic SQL

  • Basic EXECUTE: EXECUTE 'SELECT 1';
  • EXECUTE with INTO: EXECUTE 'SELECT a FROM t' INTO x;
  • EXECUTE with INTO STRICT: EXECUTE 'SELECT a FROM t' INTO STRICT x;
  • EXECUTE with USING: EXECUTE 'INSERT INTO t VALUES($1)' USING val;
  • EXECUTE with multiple USING: EXECUTE $q$ SELECT $1, $2 $q$ USING a, b;
  • EXECUTE with INTO and USING: EXECUTE 'SELECT a FROM t WHERE id=$1' INTO x USING my_id;
  • EXECUTE with USING before INTO: EXECUTE 'SELECT $1' USING a INTO x;
  • EXECUTE expression concatenation: EXECUTE 'SELECT * FROM ' || quote_ident(tbl);
  • EXECUTE with format(): EXECUTE format('SELECT * FROM %I', tbl);
  • Duplicate INTO produces error: EXECUTE 'SELECT 1' INTO x INTO y;
  • Duplicate USING produces error: EXECUTE 'SELECT $1' USING a USING b;

Phase 4: Cursor Operations and Diagnostics

4.1 OPEN Cursor

  • Open bound cursor: OPEN cur;
  • Open bound cursor with arguments: OPEN cur(1, 'a');
  • Open unbound cursor for query: OPEN cur FOR SELECT * FROM t;
  • Open unbound cursor with SCROLL: OPEN cur SCROLL FOR SELECT * FROM t;
  • Open unbound cursor with NO SCROLL: OPEN cur NO SCROLL FOR SELECT * FROM t;
  • Open for EXECUTE: OPEN cur FOR EXECUTE 'SELECT * FROM t';
  • Open for EXECUTE with USING: OPEN cur FOR EXECUTE 'SELECT * FROM t WHERE id=$1' USING my_id;

4.2 FETCH and MOVE

  • FETCH default (NEXT): FETCH cur INTO x;
  • FETCH NEXT: FETCH NEXT FROM cur INTO x;
  • FETCH PRIOR: FETCH PRIOR FROM cur INTO x;
  • FETCH FIRST: FETCH FIRST FROM cur INTO x;
  • FETCH LAST: FETCH LAST FROM cur INTO x;
  • FETCH ABSOLUTE n: FETCH ABSOLUTE 5 FROM cur INTO x;
  • FETCH RELATIVE n: FETCH RELATIVE -1 FROM cur INTO x;
  • FETCH FORWARD: FETCH FORWARD FROM cur INTO x;
  • FETCH FORWARD n: FETCH FORWARD 5 FROM cur INTO x;
  • FETCH FORWARD ALL: FETCH FORWARD ALL FROM cur INTO x;
  • FETCH BACKWARD: FETCH BACKWARD FROM cur INTO x;
  • FETCH BACKWARD n: FETCH BACKWARD 3 FROM cur INTO x;
  • FETCH BACKWARD ALL: FETCH BACKWARD ALL FROM cur INTO x;
  • FETCH with IN instead of FROM: FETCH NEXT IN cur INTO x;
  • MOVE (no INTO): MOVE NEXT FROM cur;
  • MOVE with direction: MOVE FORWARD ALL FROM cur;
  • MOVE ABSOLUTE: MOVE ABSOLUTE 0 FROM cur;

4.3 CLOSE and GET DIAGNOSTICS

  • CLOSE cursor: CLOSE cur;
  • GET DIAGNOSTICS (current): GET DIAGNOSTICS rowcount = ROW_COUNT;
  • GET CURRENT DIAGNOSTICS: GET CURRENT DIAGNOSTICS rowcount = ROW_COUNT;
  • GET DIAGNOSTICS multiple items: GET DIAGNOSTICS rc = ROW_COUNT, ctx = PG_CONTEXT;
  • GET STACKED DIAGNOSTICS: GET STACKED DIAGNOSTICS msg = MESSAGE_TEXT;
  • GET STACKED DIAGNOSTICS multiple: GET STACKED DIAGNOSTICS st = RETURNED_SQLSTATE, msg = MESSAGE_TEXT, det = PG_EXCEPTION_DETAIL;
  • Assignment operator := in diagnostics: GET DIAGNOSTICS rc := ROW_COUNT;
  • All current items: ROW_COUNT, PG_CONTEXT, PG_ROUTINE_OID
  • All stacked items: RETURNED_SQLSTATE, MESSAGE_TEXT, PG_EXCEPTION_DETAIL, PG_EXCEPTION_HINT, PG_EXCEPTION_CONTEXT, COLUMN_NAME, CONSTRAINT_NAME, PG_DATATYPE_NAME, TABLE_NAME, SCHEMA_NAME

Phase 5: Advanced Statements

5.1 RAISE

  • Bare RAISE (re-raise): RAISE;
  • RAISE with level and message: RAISE NOTICE 'hello %', name;
  • RAISE DEBUG: RAISE DEBUG 'debug info';
  • RAISE LOG: RAISE LOG 'log entry';
  • RAISE INFO: RAISE INFO 'info message';
  • RAISE WARNING: RAISE WARNING 'warning %', detail;
  • RAISE EXCEPTION (default level): RAISE EXCEPTION 'error occurred';
  • RAISE without level (defaults to EXCEPTION): RAISE 'error occurred';
  • RAISE with multiple params: RAISE NOTICE '% and % and %', a, b, c;
  • RAISE with USING: RAISE EXCEPTION 'fail' USING ERRCODE = 'P0001';
  • RAISE with multiple USING options: RAISE EXCEPTION 'fail' USING ERRCODE = 'P0001', DETAIL = 'more info', HINT = 'try this';
  • RAISE with condition name: RAISE division_by_zero;
  • RAISE with SQLSTATE: RAISE SQLSTATE '22012';
  • RAISE USING only: RAISE USING MESSAGE = 'dynamic error', ERRCODE = 'P0001';
  • All USING options: MESSAGE, DETAIL, HINT, ERRCODE, COLUMN, CONSTRAINT, DATATYPE, TABLE, SCHEMA

5.2 ASSERT, CALL, Transaction Control, NULL

  • ASSERT: ASSERT x > 0;
  • ASSERT with message: ASSERT x > 0, 'x must be positive';
  • CALL procedure: CALL my_proc(1, 2);
  • CALL with schema: CALL public.my_proc();
  • DO block: DO $$ BEGIN RAISE NOTICE 'hi'; END $$;
  • COMMIT: COMMIT;
  • COMMIT AND CHAIN: COMMIT AND CHAIN;
  • COMMIT AND NO CHAIN: COMMIT AND NO CHAIN;
  • ROLLBACK: ROLLBACK;
  • ROLLBACK AND CHAIN: ROLLBACK AND CHAIN;
  • ROLLBACK AND NO CHAIN: ROLLBACK AND NO CHAIN;
  • NULL statement: NULL;
  • NULL statement in empty branches: IF x THEN NULL; ELSE do_something(); END IF;

5.3 Exception Handling

  • Single WHEN clause: EXCEPTION WHEN division_by_zero THEN x := 0;
  • Multiple WHEN clauses: EXCEPTION WHEN no_data_found THEN ... WHEN too_many_rows THEN ...
  • OR conditions: EXCEPTION WHEN division_by_zero OR unique_violation THEN ...
  • SQLSTATE code: EXCEPTION WHEN SQLSTATE '22012' THEN ...
  • OTHERS catch-all: EXCEPTION WHEN OTHERS THEN ...
  • Multiple statements in handler body
  • GET STACKED DIAGNOSTICS inside handler
  • RAISE inside handler (re-raise): EXCEPTION WHEN OTHERS THEN RAISE;
  • Nested block with own exception handler
  • Exception block with no statements before EXCEPTION
  • Exception handler with SQLSTATE and named conditions combined via OR: WHEN SQLSTATE '23505' OR unique_violation THEN

Phase 6: Integration and Edge Cases

6.1 Nested and Complex Structures

  • Deeply nested blocks (3 levels): BEGIN BEGIN BEGIN ... END; END; END
  • Block inside IF branch
  • Block inside FOR loop body
  • Block inside exception handler
  • Loop inside IF inside block
  • FOR loop containing IF containing RETURN QUERY
  • Exception handler containing loop
  • Multiple DECLARE sections in nested blocks
  • Label scoping: inner label shadows outer

6.2 SQL Fragment Extraction

  • SQL expression in IF condition extracted as text until THEN
  • SQL expression in WHILE condition extracted until LOOP
  • SQL expression in RETURN extracted until semicolon
  • SQL in RETURN QUERY extracted as full SELECT text until semicolon
  • SQL in FOR..IN extracted correctly (distinguish integer range .. from query)
  • SQL in EXECUTE extracted until INTO/USING/semicolon
  • SQL in PERFORM extracted until semicolon
  • SQL in assignment RHS extracted until semicolon
  • Parenthesized expressions in SQL fragments don't cause false termination
  • Dollar-quoted strings within SQL fragments preserved
  • String literals with semicolons in SQL fragments don't cause false termination
  • Nested parentheses in SQL fragments tracked correctly

6.3 Error Reporting

  • Error position points to offending token
  • Missing semicolon after statement
  • Unexpected keyword in expression context
  • Unterminated block (missing END)
  • Unterminated IF (missing END IF)
  • Unterminated LOOP (missing END LOOP)
  • Unterminated CASE (missing END CASE)
  • Unknown statement keyword produces meaningful error
  • Label without matching block/loop
  • END label mismatch: <<a>> BEGIN END b;
  • Empty input produces error or empty block
  • Duplicate DECLARE sections in same block (should be single DECLARE before BEGIN)