-
Notifications
You must be signed in to change notification settings - Fork 91
fix(luks): use fixed data offset for consistent LUKS header size #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Different systems may create LUKS2 with different default header sizes (16 MiB vs 32 MiB) depending on system configuration. This causes "Low.dev. smaller than requested DRBD-dev. size" errors when performing toggle-disk operations, as the DRBD device expects a certain size from peer nodes but the local LUKS device has less usable space due to a larger header. Fix by always specifying --offset 32768 (16 MiB) when creating LUKS devices to ensure consistent header size across all nodes. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
## What this PR does Build piraeus-server (linstor-server) from source with custom patches: - **adjust-on-resfile-change.diff** — Use actual device path in res file during toggle-disk; fix LUKS data offset - Upstream: [#473](LINBIT/linstor-server#473), [#472](LINBIT/linstor-server#472) - **allow-toggle-disk-retry.diff** — Allow retry and cancellation of failed toggle-disk operations - Upstream: [#475](LINBIT/linstor-server#475) - **force-metadata-check-on-disk-add.diff** — Create metadata during toggle-disk from diskless to diskful - Upstream: [#474](LINBIT/linstor-server#474) - **skip-adjust-when-device-inaccessible.diff** — Skip DRBD adjust/res file regeneration when child layer device is inaccessible - Upstream: [#471](LINBIT/linstor-server#471) Also updates plunger-satellite script and values.yaml for the new build. ### Release note ```release-note [linstor] Build linstor-server with custom patches for improved disk handling ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added automatic DRBD stall detection and recovery, improving storage resync resilience without manual intervention. * Introduced configurable container image references via Helm values for streamlined deployment. * **Bug Fixes** * Enhanced disk toggle operations with retry and cancellation support for better error handling. * Improved metadata creation during disk state transitions. * Added device accessibility checks to prevent errors when underlying storage devices are unavailable. * Fixed LUKS encryption header sizing for consistent deployment across nodes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
|
LGTM, except we can't merge yet, as we have to remove rhel7 support first, as the old cryptsetup doesn't work with this. for the other PR's I'll have to review/check them together with Gabor, who will be back next week. |
## What this PR does Build piraeus-server (linstor-server) from source with custom patches: - **adjust-on-resfile-change.diff** — Use actual device path in res file during toggle-disk; fix LUKS data offset - Upstream: [#473](LINBIT/linstor-server#473), [#472](LINBIT/linstor-server#472) - **allow-toggle-disk-retry.diff** — Allow retry and cancellation of failed toggle-disk operations - Upstream: [#475](LINBIT/linstor-server#475) - **force-metadata-check-on-disk-add.diff** — Create metadata during toggle-disk from diskless to diskful - Upstream: [#474](LINBIT/linstor-server#474) - **skip-adjust-when-device-inaccessible.diff** — Skip DRBD adjust/res file regeneration when child layer device is inaccessible - Upstream: [#471](LINBIT/linstor-server#471) Also updates plunger-satellite script and values.yaml for the new build. ### Release note ```release-note [linstor] Build linstor-server with custom patches for improved disk handling ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added automatic DRBD stall detection and recovery, improving storage resync resilience without manual intervention. * Introduced configurable container image references via Helm values for streamlined deployment. * **Bug Fixes** * Enhanced disk toggle operations with retry and cancellation support for better error handling. * Improved metadata creation during disk state transitions. * Added device accessibility checks to prevent errors when underlying storage devices are unavailable. * Fixed LUKS encryption header sizing for consistent deployment across nodes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does Build piraeus-server (linstor-server) from source with custom patches: - **adjust-on-resfile-change.diff** — Use actual device path in res file during toggle-disk; fix LUKS data offset - Upstream: [#473](LINBIT/linstor-server#473), [#472](LINBIT/linstor-server#472) - **allow-toggle-disk-retry.diff** — Allow retry and cancellation of failed toggle-disk operations - Upstream: [#475](LINBIT/linstor-server#475) - **force-metadata-check-on-disk-add.diff** — Create metadata during toggle-disk from diskless to diskful - Upstream: [#474](LINBIT/linstor-server#474) - **skip-adjust-when-device-inaccessible.diff** — Skip DRBD adjust/res file regeneration when child layer device is inaccessible - Upstream: [#471](LINBIT/linstor-server#471) Also updates plunger-satellite script and values.yaml for the new build. ### Release note ```release-note [linstor] Build linstor-server with custom patches for improved disk handling ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added automatic DRBD stall detection and recovery, improving storage resync resilience without manual intervention. * Introduced configurable container image references via Helm values for streamlined deployment. * **Bug Fixes** * Enhanced disk toggle operations with retry and cancellation support for better error handling. * Improved metadata creation during disk state transitions. * Added device accessibility checks to prevent errors when underlying storage devices are unavailable. * Fixed LUKS encryption header sizing for consistent deployment across nodes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does Build piraeus-server (linstor-server) from source with custom patches: - **adjust-on-resfile-change.diff** — Use actual device path in res file during toggle-disk; fix LUKS data offset - Upstream: [#473](LINBIT/linstor-server#473), [#472](LINBIT/linstor-server#472) - **allow-toggle-disk-retry.diff** — Allow retry and cancellation of failed toggle-disk operations - Upstream: [#475](LINBIT/linstor-server#475) - **force-metadata-check-on-disk-add.diff** — Create metadata during toggle-disk from diskless to diskful - Upstream: [#474](LINBIT/linstor-server#474) - **skip-adjust-when-device-inaccessible.diff** — Skip DRBD adjust/res file regeneration when child layer device is inaccessible - Upstream: [#471](LINBIT/linstor-server#471) Also updates plunger-satellite script and values.yaml for the new build. ### Release note ```release-note [linstor] Build linstor-server with custom patches for improved disk handling ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added automatic DRBD stall detection and recovery, improving storage resync resilience without manual intervention. * Introduced configurable container image references via Helm values for streamlined deployment. * **Bug Fixes** * Enhanced disk toggle operations with retry and cancellation support for better error handling. * Improved metadata creation during disk state transitions. * Added device accessibility checks to prevent errors when underlying storage devices are unavailable. * Fixed LUKS encryption header sizing for consistent deployment across nodes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
|
I don't want to sound mean, but is If not using If we already assume that all of the LUKS related options must be set in LINSTOR, a different approach (which honestly I would prefer) is to not ignore those properties but simply take them into account. We have for example the |
|
@ghernadi you're right, I've investigated the root cause and found it: ZFS zvols report Here's the evidence from my production cluster:
# cat /sys/block/zd720/queue/optimal_io_size
33554432
# cat /sys/block/zd720/queue/minimum_io_size
16384
# cat /sys/block/zd720/queue/physical_block_size
16384
# echo -n "pass" | cryptsetup -v --debug luksFormat --type luks2 /dev/zvol/data/test - 2>&1 | grep -i "topology\|offset"
# Topology: IO (16384/33554432), offset = 0; Required alignment is 33554432 bytes.
# Device size 104857600, offset 33554432.
# cryptsetup luksDump /dev/zvol/data/test | grep offset
offset: 33554432 [bytes] # = 32 MiB
# cat /sys/block/loop0/queue/optimal_io_size
0
# cryptsetup luksDump /dev/loop0 | grep offset
offset: 16777216 [bytes] # = 16 MiB (default)So the issue is: when creating LUKS on ZFS zvols without explicit --offset, cryptsetup picks up the This causes inconsistent offsets when nodes run different kernel/zfs versions (with different topology handling behavior), or when new replicas are added after system updates that change the |
Summary
Different systems may create LUKS2 with different default header sizes (16 MiB vs 32 MiB) depending on system configuration (cryptsetup defaults,
/etc/cryptsetup-default.conf, etc.).This causes issues during toggle-disk operations when adding a disk to a diskless resource:
Low.dev. smaller than requested DRBD-dev. sizeerrorExample:
Solution
Always specify
--offset 32768(16 MiB) when creating LUKS devices to ensure consistent header size across all nodes. This is the minimum standard offset for LUKS2.Also exclude
--offsetfromadditionalOptionsto prevent users from accidentally breaking consistency.