Goal: Hand-written PL/pgSQL recursive descent parser matching omni pg/parser architecture with full PostgreSQL PL/pgSQL compatibility Verification: Go unit tests — each scenario is a PL/pgSQL body string that must parse to the correct AST (or produce the correct error) Reference sources: PostgreSQL src/pl/plpgsql/src/pl_gram.y, plpgsql.h, pl_scanner.c, pl_reserved_kwlist.h, pl_unreserved_kwlist.h; PostgreSQL docs Chapter 43
Status: [ ] pending, [x] passing, [~] partial
- PLBlock node — cmd_type, label, body (statement list), declarations, exception block
- PLDeclare node — variable name, type reference, constant flag, not-null flag, collation, default expression
- PLCursorDecl node — cursor name, scroll option, argument list, query text
- PLAliasDecl node — alias name, referenced variable name
- PLIf node — condition text, then-body, elsif list, else-body
- PLCase node — test expression text, case-when list, else-body, has-test flag
- PLLoop node — label, body
- PLWhile node — label, condition text, body
- PLForI node — label, variable, lower/upper/step expression texts, reverse flag, body
- PLForS node — label, variable, query text, body
- PLForC node — label, variable, cursor variable, arg query text, body
- PLForDynS node — label, variable, query text, params list, body
- PLForEachA node — label, variable, slice dimension, array expression text, body
- PLReturn node — expression text (optional)
- PLReturnNext node — expression text
- PLReturnQuery node — query text (static), dynamic query text, params list
- PLAssign node — variable target, expression text
- PLExecSQL node — SQL text, into-target, strict flag
- PLDynExecute node — query text, into-target, strict flag, params list
- PLPerform node — expression text
- PLCall node — SQL text (CALL or DO)
- PLRaise node — level, condition name, sqlstate, message text, params, options list
- PLAssert node — condition text, message text
- PLGetDiag node — is-stacked flag, diagnostic items list
- PLOpen node — cursor variable, query text, dynamic query text, params, scroll option
- PLFetch node — cursor variable, direction, direction count, into-target, is-move flag
- PLClose node — cursor variable
- PLExit node — is-continue flag, label, condition text
- PLCommit node — chain flag
- PLRollback node — chain flag
- PLNull node (null statement)
- All nodes implement omni Node interface with Tag() method
- All nodes carry Loc (start, end byte offsets)
- Reserved keywords: ALL, BEGIN, BY, CASE, DECLARE, ELSE, END, EXECUTE, FOR, FOREACH, FROM, IF, IN, INTO, LOOP, NOT, NULL, OR, STRICT, THEN, TO, USING, WHEN, WHILE
- Unreserved keywords recognized: ABSOLUTE, ALIAS, AND, ARRAY, ASSERT, BACKWARD, CALL, CHAIN, CLOSE, COLLATE, COLUMN, COLUMN_NAME, COMMIT, CONSTANT, CONSTRAINT, CONSTRAINT_NAME, CONTINUE, CURRENT, CURSOR, DATATYPE, DEBUG, DEFAULT, DETAIL, DIAGNOSTICS, DO, DUMP, ELSIF, ELSEIF, ERRCODE, ERROR, EXCEPTION, EXIT, FETCH, FIRST, FORWARD, GET, HINT, IMPORT, INFO, INSERT, IS, LAST, LOG, MERGE, MESSAGE, MESSAGE_TEXT, MOVE, NEXT, NO, NOTICE, OPEN, OPTION, PERFORM, PG_CONTEXT, PG_DATATYPE_NAME, PG_EXCEPTION_CONTEXT, PG_EXCEPTION_DETAIL, PG_EXCEPTION_HINT, PG_ROUTINE_OID, PRINT_STRICT_PARAMS, PRIOR, QUERY, RAISE, RELATIVE, RETURN, RETURNED_SQLSTATE, REVERSE, ROLLBACK, ROW_COUNT, ROWTYPE, SCHEMA, SCHEMA_NAME, SCROLL, SLICE, SQLSTATE, STACKED, TABLE, TABLE_NAME, TYPE, USE_COLUMN, USE_VARIABLE, VARIABLE_CONFLICT, WARNING
- Unreserved keywords usable as variable names in declarations
- Reserved keywords NOT usable as variable names
- Keyword lookup is case-insensitive
-
#option dumpdirective before block -
#print_strict_params ondirective -
#print_strict_params offdirective -
#variable_conflict errordirective -
#variable_conflict use_variabledirective -
#variable_conflict use_columndirective - Multiple directives before block
- Unknown
#optionproduces error
- Entry point accepts PL/pgSQL function body string, returns AST root
- Empty block:
BEGIN ENDparses to PLBlock with empty body - Empty block with semicolons:
BEGIN END;parses correctly - Block with DECLARE and no variables:
DECLARE BEGIN END - Block with trailing label:
<<lbl>> BEGIN END lbl - Label mismatch at END produces error
- Mismatched END keyword (e.g.
END IFin block) produces error - Unexpected token at top level produces error with position
- Missing END produces error
- Multiple statements in block separated by semicolons
- Nested block:
BEGIN BEGIN END; END - Labeled nested block:
<<inner>> BEGIN END inner; - Label between DECLARE and BEGIN produces error:
DECLARE <<lbl>> BEGIN END; - Extra DECLARE keyword within section:
DECLARE DECLARE x int; BEGIN ENDparses correctly
- Scalar:
DECLARE x integer; - With default
:=:DECLARE x integer := 0; - With default
=:DECLARE x integer = 0; - With default
DEFAULT:DECLARE x integer DEFAULT 0; - CONSTANT:
DECLARE x CONSTANT integer := 42; - NOT NULL:
DECLARE x integer NOT NULL := 1; - CONSTANT NOT NULL:
DECLARE x CONSTANT integer NOT NULL := 1; - COLLATE:
DECLARE x text COLLATE "en_US"; - COLLATE with NOT NULL:
DECLARE x text COLLATE "C" NOT NULL := 'a'; - Record type:
DECLARE r record; - Qualified type:
DECLARE x public.my_type; - %TYPE reference:
DECLARE x my_table.my_column%TYPE; - %ROWTYPE reference:
DECLARE r my_table%ROWTYPE; - Multiple variables:
DECLARE x int; y text; z boolean; - Cursor declaration:
DECLARE c CURSOR FOR SELECT 1; - Cursor with SCROLL:
DECLARE c SCROLL CURSOR FOR SELECT 1; - Cursor with NO SCROLL:
DECLARE c NO SCROLL CURSOR FOR SELECT 1; - Cursor with parameters:
DECLARE c CURSOR (p1 int, p2 text) FOR SELECT p1, p2; - Cursor with IS (synonym for FOR):
DECLARE c CURSOR IS SELECT 1; - Alias declaration:
DECLARE a ALIAS FOR $1; - Alias for named parameter:
DECLARE a ALIAS FOR param_name; - Expression default spans until semicolon:
DECLARE x int := 1 + 2 * 3; - Array type:
DECLARE a integer[]; - Array type with dimension:
DECLARE a integer[10]; - NOT NULL without default produces error:
DECLARE x integer NOT NULL; - CONSTANT without default produces error:
DECLARE x CONSTANT integer;
- Simple IF-THEN-END IF:
IF x > 0 THEN y := 1; END IF; - IF-THEN-ELSE:
IF x > 0 THEN y := 1; ELSE y := 0; END IF; - IF-THEN-ELSIF-ELSE:
IF x > 0 THEN y := 1; ELSIF x = 0 THEN y := 0; ELSE y := -1; END IF; - Multiple ELSIF branches: three or more ELSIF clauses
- ELSEIF synonym:
IF x THEN y := 1; ELSEIF z THEN y := 2; END IF; - Nested IF inside IF:
IF a THEN IF b THEN c := 1; END IF; END IF; - Condition expression spans until THEN:
IF x > 0 AND y < 10 THEN ... - Empty THEN body:
IF x THEN END IF;— parses with empty body - Multiple statements in THEN body
- Multiple statements in ELSE body
- Missing THEN produces error
- Missing END IF produces error
- Searched CASE:
CASE WHEN x > 0 THEN y := 1; WHEN x = 0 THEN y := 0; END CASE; - Searched CASE with ELSE:
CASE WHEN x > 0 THEN y := 1; ELSE y := -1; END CASE; - Simple CASE:
CASE x WHEN 1 THEN y := 'a'; WHEN 2 THEN y := 'b'; END CASE; - Simple CASE with ELSE:
CASE x WHEN 1 THEN y := 'a'; ELSE y := 'z'; END CASE; - Multiple WHEN branches
- Multiple statements per WHEN body
- Nested CASE inside CASE
- Missing END CASE produces error
- No WHEN clause produces error
- Infinite LOOP:
LOOP x := x + 1; EXIT WHEN x > 10; END LOOP; - Labeled LOOP:
<<myloop>> LOOP EXIT myloop; END LOOP myloop; - WHILE loop:
WHILE x > 0 LOOP x := x - 1; END LOOP; - Labeled WHILE:
<<w>> WHILE x > 0 LOOP x := x - 1; END LOOP w; - Condition expression spans until LOOP keyword
- Nested loops: LOOP inside LOOP
- Empty loop body:
LOOP END LOOP; - Missing END LOOP produces error
- WHILE missing LOOP keyword produces error
- Integer FOR:
FOR i IN 1..10 LOOP x := x + i; END LOOP; - Integer FOR REVERSE:
FOR i IN REVERSE 10..1 LOOP x := x + i; END LOOP; - Integer FOR with BY:
FOR i IN 1..100 BY 5 LOOP x := x + i; END LOOP; - Integer FOR REVERSE with BY:
FOR i IN REVERSE 100..1 BY 10 LOOP END LOOP; - Integer FOR with expression bounds:
FOR i IN a+1..b*2 LOOP END LOOP; - Query FOR:
FOR rec IN SELECT * FROM t LOOP x := rec.a; END LOOP; - Query FOR with complex query:
FOR rec IN SELECT a, b FROM t WHERE c > 0 ORDER BY a LOOP END LOOP; - Cursor FOR:
FOR rec IN cur LOOP END LOOP;(cur is cursor variable) - Cursor FOR with arguments:
FOR rec IN cur(1, 'a') LOOP END LOOP; - Dynamic FOR:
FOR rec IN EXECUTE 'SELECT * FROM ' || tbl LOOP END LOOP; - Dynamic FOR with USING:
FOR rec IN EXECUTE 'SELECT * FROM t WHERE id = $1' USING my_id LOOP END LOOP; - FOREACH ARRAY:
FOREACH x IN ARRAY arr LOOP END LOOP; - FOREACH ARRAY with SLICE:
FOREACH x SLICE 1 IN ARRAY arr LOOP END LOOP; - Labeled FOR:
<<fl>> FOR i IN 1..10 LOOP END LOOP fl; - Label mismatch on FOR loop END produces error
- Nested FOR loops
- Bare EXIT:
EXIT; - EXIT with label:
EXIT myloop; - EXIT with WHEN:
EXIT WHEN x > 10; - EXIT with label and WHEN:
EXIT myloop WHEN x > 10; - Bare CONTINUE:
CONTINUE; - CONTINUE with label:
CONTINUE myloop; - CONTINUE with WHEN:
CONTINUE WHEN x = 0; - CONTINUE with label and WHEN:
CONTINUE myloop WHEN x = 0; - WHEN condition expression spans until semicolon
- Simple assignment:
x := 1; - Expression assignment:
x := a + b * c; - Function call assignment:
x := my_func(a, b); - Subquery assignment:
x := (SELECT max(a) FROM t); - Record field assignment:
rec.field := 42; - Array element assignment:
arr[1] := 'hello'; - Array slice assignment:
arr[1:3] := ARRAY[1,2,3]; - Multi-level field:
rec.nested.field := 1; - Assignment with complex RHS spanning until semicolon
- Assignment with
=operator (PG also accepts this):x = 1;
- Simple RETURN:
RETURN; - RETURN with expression:
RETURN x + 1; - RETURN with subquery:
RETURN (SELECT count(*) FROM t); - RETURN NEXT:
RETURN NEXT x; - RETURN NEXT with record:
RETURN NEXT rec; - RETURN NEXT bare (for OUT params):
RETURN NEXT; - RETURN QUERY static:
RETURN QUERY SELECT * FROM t; - RETURN QUERY with WHERE:
RETURN QUERY SELECT a, b FROM t WHERE c > 0; - RETURN QUERY EXECUTE:
RETURN QUERY EXECUTE 'SELECT * FROM t'; - RETURN QUERY EXECUTE with USING:
RETURN QUERY EXECUTE 'SELECT * FROM t WHERE id = $1' USING my_id; - RETURN QUERY EXECUTE with multiple USING:
RETURN QUERY EXECUTE $q$ SELECT $1, $2 $q$ USING a, b; - Expression text spans until semicolon for RETURN value
- PERFORM:
PERFORM my_func(1, 2); - PERFORM with query:
PERFORM * FROM t WHERE a > 0;(converted from SELECT) - Bare SQL — INSERT:
INSERT INTO t VALUES (1, 2); - Bare SQL — UPDATE:
UPDATE t SET a = 1 WHERE b = 2; - Bare SQL — DELETE:
DELETE FROM t WHERE a = 1; - SQL with INTO:
SELECT a, b INTO x, y FROM t WHERE id = 1; - SQL with INTO STRICT:
SELECT a INTO STRICT x FROM t WHERE id = 1; - SQL statement text extracted until semicolon
- SQL keywords (INSERT, UPDATE, DELETE, SELECT, MERGE, IMPORT) recognized as statement starters
- Bare SQL — IMPORT:
IMPORT FOREIGN SCHEMA s FROM SERVER srv INTO public;
- Basic EXECUTE:
EXECUTE 'SELECT 1'; - EXECUTE with INTO:
EXECUTE 'SELECT a FROM t' INTO x; - EXECUTE with INTO STRICT:
EXECUTE 'SELECT a FROM t' INTO STRICT x; - EXECUTE with USING:
EXECUTE 'INSERT INTO t VALUES($1)' USING val; - EXECUTE with multiple USING:
EXECUTE $q$ SELECT $1, $2 $q$ USING a, b; - EXECUTE with INTO and USING:
EXECUTE 'SELECT a FROM t WHERE id=$1' INTO x USING my_id; - EXECUTE with USING before INTO:
EXECUTE 'SELECT $1' USING a INTO x; - EXECUTE expression concatenation:
EXECUTE 'SELECT * FROM ' || quote_ident(tbl); - EXECUTE with format():
EXECUTE format('SELECT * FROM %I', tbl); - Duplicate INTO produces error:
EXECUTE 'SELECT 1' INTO x INTO y; - Duplicate USING produces error:
EXECUTE 'SELECT $1' USING a USING b;
- Open bound cursor:
OPEN cur; - Open bound cursor with arguments:
OPEN cur(1, 'a'); - Open unbound cursor for query:
OPEN cur FOR SELECT * FROM t; - Open unbound cursor with SCROLL:
OPEN cur SCROLL FOR SELECT * FROM t; - Open unbound cursor with NO SCROLL:
OPEN cur NO SCROLL FOR SELECT * FROM t; - Open for EXECUTE:
OPEN cur FOR EXECUTE 'SELECT * FROM t'; - Open for EXECUTE with USING:
OPEN cur FOR EXECUTE 'SELECT * FROM t WHERE id=$1' USING my_id;
- FETCH default (NEXT):
FETCH cur INTO x; - FETCH NEXT:
FETCH NEXT FROM cur INTO x; - FETCH PRIOR:
FETCH PRIOR FROM cur INTO x; - FETCH FIRST:
FETCH FIRST FROM cur INTO x; - FETCH LAST:
FETCH LAST FROM cur INTO x; - FETCH ABSOLUTE n:
FETCH ABSOLUTE 5 FROM cur INTO x; - FETCH RELATIVE n:
FETCH RELATIVE -1 FROM cur INTO x; - FETCH FORWARD:
FETCH FORWARD FROM cur INTO x; - FETCH FORWARD n:
FETCH FORWARD 5 FROM cur INTO x; - FETCH FORWARD ALL:
FETCH FORWARD ALL FROM cur INTO x; - FETCH BACKWARD:
FETCH BACKWARD FROM cur INTO x; - FETCH BACKWARD n:
FETCH BACKWARD 3 FROM cur INTO x; - FETCH BACKWARD ALL:
FETCH BACKWARD ALL FROM cur INTO x; - FETCH with IN instead of FROM:
FETCH NEXT IN cur INTO x; - MOVE (no INTO):
MOVE NEXT FROM cur; - MOVE with direction:
MOVE FORWARD ALL FROM cur; - MOVE ABSOLUTE:
MOVE ABSOLUTE 0 FROM cur;
- CLOSE cursor:
CLOSE cur; - GET DIAGNOSTICS (current):
GET DIAGNOSTICS rowcount = ROW_COUNT; - GET CURRENT DIAGNOSTICS:
GET CURRENT DIAGNOSTICS rowcount = ROW_COUNT; - GET DIAGNOSTICS multiple items:
GET DIAGNOSTICS rc = ROW_COUNT, ctx = PG_CONTEXT; - GET STACKED DIAGNOSTICS:
GET STACKED DIAGNOSTICS msg = MESSAGE_TEXT; - GET STACKED DIAGNOSTICS multiple:
GET STACKED DIAGNOSTICS st = RETURNED_SQLSTATE, msg = MESSAGE_TEXT, det = PG_EXCEPTION_DETAIL; - Assignment operator
:=in diagnostics:GET DIAGNOSTICS rc := ROW_COUNT; - All current items: ROW_COUNT, PG_CONTEXT, PG_ROUTINE_OID
- All stacked items: RETURNED_SQLSTATE, MESSAGE_TEXT, PG_EXCEPTION_DETAIL, PG_EXCEPTION_HINT, PG_EXCEPTION_CONTEXT, COLUMN_NAME, CONSTRAINT_NAME, PG_DATATYPE_NAME, TABLE_NAME, SCHEMA_NAME
- Bare RAISE (re-raise):
RAISE; - RAISE with level and message:
RAISE NOTICE 'hello %', name; - RAISE DEBUG:
RAISE DEBUG 'debug info'; - RAISE LOG:
RAISE LOG 'log entry'; - RAISE INFO:
RAISE INFO 'info message'; - RAISE WARNING:
RAISE WARNING 'warning %', detail; - RAISE EXCEPTION (default level):
RAISE EXCEPTION 'error occurred'; - RAISE without level (defaults to EXCEPTION):
RAISE 'error occurred'; - RAISE with multiple params:
RAISE NOTICE '% and % and %', a, b, c; - RAISE with USING:
RAISE EXCEPTION 'fail' USING ERRCODE = 'P0001'; - RAISE with multiple USING options:
RAISE EXCEPTION 'fail' USING ERRCODE = 'P0001', DETAIL = 'more info', HINT = 'try this'; - RAISE with condition name:
RAISE division_by_zero; - RAISE with SQLSTATE:
RAISE SQLSTATE '22012'; - RAISE USING only:
RAISE USING MESSAGE = 'dynamic error', ERRCODE = 'P0001'; - All USING options: MESSAGE, DETAIL, HINT, ERRCODE, COLUMN, CONSTRAINT, DATATYPE, TABLE, SCHEMA
- ASSERT:
ASSERT x > 0; - ASSERT with message:
ASSERT x > 0, 'x must be positive'; - CALL procedure:
CALL my_proc(1, 2); - CALL with schema:
CALL public.my_proc(); - DO block:
DO $$ BEGIN RAISE NOTICE 'hi'; END $$; - COMMIT:
COMMIT; - COMMIT AND CHAIN:
COMMIT AND CHAIN; - COMMIT AND NO CHAIN:
COMMIT AND NO CHAIN; - ROLLBACK:
ROLLBACK; - ROLLBACK AND CHAIN:
ROLLBACK AND CHAIN; - ROLLBACK AND NO CHAIN:
ROLLBACK AND NO CHAIN; - NULL statement:
NULL; - NULL statement in empty branches:
IF x THEN NULL; ELSE do_something(); END IF;
- Single WHEN clause:
EXCEPTION WHEN division_by_zero THEN x := 0; - Multiple WHEN clauses:
EXCEPTION WHEN no_data_found THEN ... WHEN too_many_rows THEN ... - OR conditions:
EXCEPTION WHEN division_by_zero OR unique_violation THEN ... - SQLSTATE code:
EXCEPTION WHEN SQLSTATE '22012' THEN ... - OTHERS catch-all:
EXCEPTION WHEN OTHERS THEN ... - Multiple statements in handler body
- GET STACKED DIAGNOSTICS inside handler
- RAISE inside handler (re-raise):
EXCEPTION WHEN OTHERS THEN RAISE; - Nested block with own exception handler
- Exception block with no statements before EXCEPTION
- Exception handler with SQLSTATE and named conditions combined via OR:
WHEN SQLSTATE '23505' OR unique_violation THEN
- Deeply nested blocks (3 levels): BEGIN BEGIN BEGIN ... END; END; END
- Block inside IF branch
- Block inside FOR loop body
- Block inside exception handler
- Loop inside IF inside block
- FOR loop containing IF containing RETURN QUERY
- Exception handler containing loop
- Multiple DECLARE sections in nested blocks
- Label scoping: inner label shadows outer
- SQL expression in IF condition extracted as text until THEN
- SQL expression in WHILE condition extracted until LOOP
- SQL expression in RETURN extracted until semicolon
- SQL in RETURN QUERY extracted as full SELECT text until semicolon
- SQL in FOR..IN extracted correctly (distinguish integer range
..from query) - SQL in EXECUTE extracted until INTO/USING/semicolon
- SQL in PERFORM extracted until semicolon
- SQL in assignment RHS extracted until semicolon
- Parenthesized expressions in SQL fragments don't cause false termination
- Dollar-quoted strings within SQL fragments preserved
- String literals with semicolons in SQL fragments don't cause false termination
- Nested parentheses in SQL fragments tracked correctly
- Error position points to offending token
- Missing semicolon after statement
- Unexpected keyword in expression context
- Unterminated block (missing END)
- Unterminated IF (missing END IF)
- Unterminated LOOP (missing END LOOP)
- Unterminated CASE (missing END CASE)
- Unknown statement keyword produces meaningful error
- Label without matching block/loop
- END label mismatch:
<<a>> BEGIN END b; - Empty input produces error or empty block
- Duplicate DECLARE sections in same block (should be single DECLARE before BEGIN)