fix: prevent resource definition deletion when PVC still exists by kvaps · Pull Request #429 · piraeusdatastore/linstor-csi

kvaps · 2026-04-03T15:45:01Z

Summary

Before deleting a resource definition, check if the corresponding Kubernetes PV
is still Bound to a PVC that is not being deleted. This prevents accidental RD
deletion when resources are stuck in DELETING state.

Problem

When DRBD resources get stuck in DELETING state (e.g. due to bitmap mismatch
after toggle-disk), deleteResourceDefinitionAndGroupIfUnused() checks only
whether all resources have the FlagDelete flag. If they do, it proceeds to
delete the entire resource definition -- even if the PVC is still active and
Bound.

This can lead to data loss: the PVC remains Bound, the VM continues running
with cached data, but the LINSTOR resource definition is deleted.

Fix

Add a safety check at the beginning of deleteResourceDefinitionAndGroupIfUnused():
query the Kubernetes API for the PV (by name, which matches the LINSTOR resource
name). If the PV exists, is not being deleted, and is Bound to a PVC that is also
not being deleted -- abort the RD deletion.

Uses the existing dynamic.Interface client pattern already used elsewhere in the
codebase.

When restoring a snapshot, the previous logic always selected the first available node from the snapshot's node list. This caused all clones of the same source volume to restore on the same node, concentrating storage load and potentially exhausting the thin pool on that single node. Randomize the node selection among available candidates to distribute restore operations evenly across snapshot nodes. Preferred (topology- matching) nodes are still prioritized, but a random one is chosen when multiple candidates exist. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

…parameters Add two new StorageClass parameters that trigger asynchronous relocation of replicas to optimal nodes after clone or snapshot restore operations. When cloning volumes or restoring from snapshots, LINSTOR places replicas on the same nodes as the source. For golden image use cases this concentrates all clones on source nodes. The new parameters use LINSTOR's query-size-info API to determine optimal placement and migrate-disk API to relocate replicas asynchronously. Both parameters default to true. Relocation is best-effort: failures are logged as warnings and do not block volume creation. An aux property on the resource definition ensures idempotency across CSI controller retries. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Move the snapshot restore relocation parameter from StorageClass to VolumeSnapshotClass where it belongs. This prevents unwanted relocation when Velero creates temporary PVCs during data mover backup (snapshotMoveData), since those PVCs use StorageClass parameters but not VolumeSnapshotClass parameters. Changes: - Remove relocateAfterSnapshotRestore from StorageClass parameters - Add snap.linstor.csi.linbit.com/relocate-after-restore to VolumeSnapshotClass parameters - Change relocateAfterClone default to false Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Rename snap.linstor.csi.linbit.com/relocate-after-restore to snap.linstor.csi.linbit.com/relocateAfterRestore for consistency with StorageClass parameter naming convention. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Before deleting a resource definition, check if the corresponding Kubernetes PV is still Bound to a PVC that is not being deleted. This prevents accidental RD deletion when resources are stuck in DELETING state (e.g. due to DRBD bitmap mismatch), which could cause data loss for active volumes. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

JoelColledge · 2026-04-07T07:30:14Z

Thanks for looking into this. There are commits in this PR that are unrelated to the description. Please keep the PR focused on one topic.

kvaps and others added 6 commits March 2, 2026 13:16

style(volume): fix gofmt alignment in Parameters struct

6531b8a

Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent resource definition deletion when PVC still exists#429

fix: prevent resource definition deletion when PVC still exists#429
kvaps wants to merge 6 commits intopiraeusdatastore:masterfrom
kvaps:fix/prevent-premature-rd-deletion

kvaps commented Apr 3, 2026

Uh oh!

JoelColledge commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kvaps commented Apr 3, 2026

Summary

Problem

Fix

Uh oh!

JoelColledge commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants