Skip to content

Fix replay of create database record for missing tablespace. #1585

Open
reshke wants to merge 3 commits intomainfrom
fix_for_db_records
Open

Fix replay of create database record for missing tablespace. #1585
reshke wants to merge 3 commits intomainfrom
fix_for_db_records

Conversation

@reshke
Copy link
Contributor

@reshke reshke commented Mar 1, 2026

There was a problem in GPDB/Cloudberry with DROP tablespace and create+drop database in-between.

From 2f0a2f8 commit message:

CREATE DATABASE
    DROP DATABASE
    DROP TABLESPACE

If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.

This problem was fixed in gpdb by 7a09e80

Its commit message says Let's revert or update the code change after the solution is finalized on upstream.

Actually, later, fix to upstream (PostgreSQL) was committed, and this fix is little different from our.

This PR revert commit 7a09e80 and applies upstreams fix.

reshke and others added 3 commits March 1, 2026 12:13
…t but we need them when re-redoing some tablespace related xlogs (e.g. database create with a tablespace) on mirror."

This reverts commit 7a09e80.
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records.  Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:

    CREATE DATABASE
    DROP DATABASE
    DROP TABLESPACE

If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.

A fix for this problem was already attempted in 49d9cfc, but it
was reverted because of design issues.  This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency.  Tablespaces
are created as real directories, and should be deleted
by later replay.  CheckRecoveryConsistency ensures
they have disappeared.

The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING.  Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.

Author: Kyotaro Horiguchi <[email protected]>
Author: Asim R Praveen <[email protected]>
Author: Paul Guo <[email protected]>
Reviewed-by: Anastasia Lubennikova <[email protected]> (older versions)
Reviewed-by: Fujii Masao <[email protected]> (older versions)
Reviewed-by: Michaël Paquier <[email protected]>
Diagnosed-by: Paul Guo <[email protected]>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
On FreeBSD, the new test fails due to a WAL file being removed before
the standby has had the chance to copy it.  Fix by adding a replication
slot to prevent the removal until after the standby has connected.

Author: Kyotaro Horiguchi <[email protected]>
Reported-by: Matthias van de Meent <[email protected]>
Discussion: https://postgr.es/m/CAEze2Wj5nau_qpjbwihvmXLfkAWOZ5TKdbnqOc6nKSiRJEoPyQ@mail.gmail.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants