roachtest: randomize volume type and XFS#159010
roachtest: randomize volume type and XFS#159010craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
a9e44dc to
32683f0
Compare
68aa2c2 to
f9ad6ed
Compare
|
Backport to 25.2 requires this one to be approved first, and chances the automated backport will fail. |
f9ad6ed to
812fe2a
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
812fe2a to
11f1a53
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
11f1a53 to
070bba4
Compare
070bba4 to
c809b5b
Compare
Prior to this patch, tests had three possiblities for volume types: 1. Specify `PreferLocalSSD()` `DisableLocalSSD()` options. 2. Force a specific volume type via `VolumeType()` (volume type being provider specific, this was only possible for tests that target a single provider). 3. Rely on the default behavior for `PreferLocalSSD()` and use the default volume type in roachprod for the provider in case local SSD was not selected. The way `PreferLocalSSD()` was implemented meant that most of the tests were actually running on local SSDs (when available on the machine type), leading to a gap in the testing strategy with regards to volume types. This patch brings the new ClusterSpec option `RandomizeVolumeType()`. In case volume type is not forced and neither `PreferLocalSSD()` or `DisableLocalSSD()` are specified, specifying `RandomizeVolumeType()` will take precendence over the `--local-ssd` argument and will randomly select a volume type from those available in the targeted provider: - AWS: gp3, io2, local SSD - GCE: pd-ssd, local SSD - Azure: premium-ssd, premium-ssd-v2, ultra-disk, local SSD - IBM: 10iops-tier Note: volume type randomization gives the same weight to all options. This patch also introduces a random chance of provisioning XFS as a filesystem via the option `RandomlyUseXfs()`. The option is built in the same way as `RandomlyUseZfs()`, granting a 20% chance of XFS if present. If both `RandomlyUseZfs()` and `RandomlyUseXfs()` are used, they both get a 20% chance, leaving 60% chance for the default ext4. Epic: none Fixes: 146661 Release note: None
c809b5b to
08b6b00
Compare
nameisbhaskar
left a comment
There was a problem hiding this comment.
Overall looks good! Added a comment to add log lines to the randomise logic code for easy debugging.
| if s.RandomlyUseZfs || s.RandomlyUseXfs { | ||
| rng, _ := randutil.NewPseudoRand() | ||
| if rng.Float64() <= 0.2 { | ||
| randFloat := rng.Float64() |
There was a problem hiding this comment.
It may help if we add some log lines in this logic. Given the number of conditions, it may be difficult to debug.
There was a problem hiding this comment.
We don't have a logger at hand in this code path so it's not really easy to log, especially without messing up the -retry_[1-3] logger used at the caller level.
The drawing logic is fairly straightforward, and the outcome is already exposed via ExposedMetamorphicInfo which will be propagated in the GH issue.
I'd be inclined to just merge this without complicating things and reconsider later if needed.
|
TFTRs! bors r=nameisbhaskar,williamchoe3,herkolategan |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. 💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details. error creating merge commit from 08b6b00 to blathers/backport-release-25.2-159010: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 25.2.x failed. See errors above. 💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
After cockroachdb#159010 introduced metamorphic volume types, an issue could occur when a test requested a specific count via `SSD(n)` or `VolumeCount(n)`, but the randomly selected volume type didn't match the option used. For example, a test specifying `SSD(8)` would get only 1 disk when `RandomizeVolumeType()` selected `pd-ssd`, because it only applied to local SSDs while `VolumeCount` (which controls persistent disk count) defaulted to 0. This patch unifies `SSD()` and `VolumeCount()` into a single `Disks()` option that specifies the number of disks regardless of storage type. The `DiskCount` field replaces both `SSDs` and `VolumeCount` in `ClusterSpec`, and `getGCEOpts` now routes this value to either `SSDCount` or `PDVolumeCount` based on the type of storage used. Multi-disk configurations with persistent disks work on all clouds. Multi-disk local SSDs remain GCE-only; when local SSD is selected on other clouds with DiskCount > 1, the system gracefully falls back to persistent disks. Fixes: cockroachdb#160518 Closes: cockroachdb#160187 Closes: cockroachdb#160205 Closes: cockroachdb#160225 Closes: cockroachdb#160542 Closes: cockroachdb#160695 Closes: cockroachdb#161058 Release note: None
After cockroachdb#159010 introduced metamorphic volume types, an issue could occur when a test requested a specific count via `SSD(n)` or `VolumeCount(n)`, but the randomly selected volume type didn't match the option used. For example, a test specifying `SSD(8)` would get only 1 disk when `RandomizeVolumeType()` selected `pd-ssd`, because it only applied to local SSDs while `VolumeCount` (which controls persistent disk count) defaulted to 0. This patch unifies `SSD()` and `VolumeCount()` into a single `Disks()` option that specifies the number of disks regardless of storage type. The `DiskCount` field replaces both `SSDs` and `VolumeCount` in `ClusterSpec`, and `getGCEOpts` now routes this value to either `SSDCount` or `PDVolumeCount` based on the type of storage used. Multi-disk configurations with persistent disks work on all clouds. Multi-disk local SSDs remain GCE-only; when local SSD is selected on other clouds with DiskCount > 1, the system gracefully falls back to persistent disks. Fixes: cockroachdb#160518 Closes: cockroachdb#160187 Closes: cockroachdb#160205 Closes: cockroachdb#160225 Closes: cockroachdb#160542 Closes: cockroachdb#160695 Closes: cockroachdb#161058 Closes: cockroachdb#161064 Release note: None
161088: roachtest: unify SSD() and VolumeCount() r=williamchoe3,DarrylWong,srosenberg a=golgeek After #159010 introduced metamorphic volume types, an issue could occur when a test requested a specific count via `SSD(n)` or `VolumeCount(n)`, but the randomly selected volume type didn't match the option used. For example, a test specifying `SSD(8)` would get only 1 disk when `RandomizeVolumeType()` selected `pd-ssd`, because it only applied to local SSDs while `VolumeCount` (which controls persistent disk count) defaulted to 0. This patch unifies `SSD()` and `VolumeCount()` into a single `Disks()` option that specifies the number of disks regardless of storage type. The `DiskCount` field replaces both `SSDs` and `VolumeCount` in `ClusterSpec`, and `getGCEOpts` now routes this value to either `SSDCount` or `PDVolumeCount` based on the type of storage used. Multi-disk configurations with persistent disks work on all clouds. Multi-disk local SSDs remain GCE-only; when local SSD is selected on other clouds with DiskCount > 1, the system gracefully falls back to persistent disks. Fixes: #160518 Closes: #160187 Closes: #160205 Closes: #160225 Closes: #160542 Closes: #160695 Closes: #161058 Closes: #161064 Release note: None Co-authored-by: Ludovic Leroux <[email protected]>
After #159010 introduced metamorphic volume types, an issue could occur when a test requested a specific count via `SSD(n)` or `VolumeCount(n)`, but the randomly selected volume type didn't match the option used. For example, a test specifying `SSD(8)` would get only 1 disk when `RandomizeVolumeType()` selected `pd-ssd`, because it only applied to local SSDs while `VolumeCount` (which controls persistent disk count) defaulted to 0. This patch unifies `SSD()` and `VolumeCount()` into a single `Disks()` option that specifies the number of disks regardless of storage type. The `DiskCount` field replaces both `SSDs` and `VolumeCount` in `ClusterSpec`, and `getGCEOpts` now routes this value to either `SSDCount` or `PDVolumeCount` based on the type of storage used. Multi-disk configurations with persistent disks work on all clouds. Multi-disk local SSDs remain GCE-only; when local SSD is selected on other clouds with DiskCount > 1, the system gracefully falls back to persistent disks. Fixes: #160518 Closes: #160187 Closes: #160205 Closes: #160225 Closes: #160542 Closes: #160695 Closes: #161058 Closes: #161064 Release note: None
After cockroachdb#159010 introduced metamorphic volume types, an issue could occur when a test requested a specific count via `SSD(n)` or `VolumeCount(n)`, but the randomly selected volume type didn't match the option used. For example, a test specifying `SSD(8)` would get only 1 disk when `RandomizeVolumeType()` selected `pd-ssd`, because it only applied to local SSDs while `VolumeCount` (which controls persistent disk count) defaulted to 0. This patch unifies `SSD()` and `VolumeCount()` into a single `Disks()` option that specifies the number of disks regardless of storage type. The `DiskCount` field replaces both `SSDs` and `VolumeCount` in `ClusterSpec`, and `getGCEOpts` now routes this value to either `SSDCount` or `PDVolumeCount` based on the type of storage used. Multi-disk configurations with persistent disks work on all clouds. Multi-disk local SSDs remain GCE-only; when local SSD is selected on other clouds with DiskCount > 1, the system gracefully falls back to persistent disks. Fixes: cockroachdb#160518 Closes: cockroachdb#160187 Closes: cockroachdb#160205 Closes: cockroachdb#160225 Closes: cockroachdb#160542 Closes: cockroachdb#160695 Closes: cockroachdb#161058 Closes: cockroachdb#161064 Release note: None
After cockroachdb#159010 introduced metamorphic volume types, an issue could occur when a test requested a specific count via `SSD(n)` or `VolumeCount(n)`, but the randomly selected volume type didn't match the option used. For example, a test specifying `SSD(8)` would get only 1 disk when `RandomizeVolumeType()` selected `pd-ssd`, because it only applied to local SSDs while `VolumeCount` (which controls persistent disk count) defaulted to 0. This patch unifies `SSD()` and `VolumeCount()` into a single `Disks()` option that specifies the number of disks regardless of storage type. The `DiskCount` field replaces both `SSDs` and `VolumeCount` in `ClusterSpec`, and `getGCEOpts` now routes this value to either `SSDCount` or `PDVolumeCount` based on the type of storage used. Multi-disk configurations with persistent disks work on all clouds. Multi-disk local SSDs remain GCE-only; when local SSD is selected on other clouds with DiskCount > 1, the system gracefully falls back to persistent disks. Fixes: cockroachdb#160518 Closes: cockroachdb#160187 Closes: cockroachdb#160205 Closes: cockroachdb#160225 Closes: cockroachdb#160542 Closes: cockroachdb#160695 Closes: cockroachdb#161058 Closes: cockroachdb#161064 Release note: None
Prior to this patch, tests had three possiblities for volume types:
PreferLocalSSD()DisableLocalSSD()options.VolumeType()(volume type being provider specific, this was only possible for tests that target a single provider).PreferLocalSSD()and use the default volume type in roachprod for the provider in case local SSD was not selected.The way
PreferLocalSSD()was implemented meant that most of the tests were actually running on local SSDs (when available on the machine type), leading to a gap in the testing strategy with regards to volume types.This patch brings the new ClusterSpec option
RandomizeVolumeType(). In case volume type is not forced and neitherPreferLocalSSD()orDisableLocalSSD()are specified, specifyingRandomizeVolumeType()will take precendence over the--local-ssdargument and will randomly select a volume type from those available in the targeted provider:Note: volume type randomization gives the same weight to all options.
This patch also introduces a random chance of provisioning XFS as a filesystem via the option
RandomlyUseXfs().The option is built in the same way as
RandomlyUseZfs(), granting a 20% chance of XFS if present.If both
RandomlyUseZfs()andRandomlyUseXfs()are used, they both get a 20% chance, leaving 60% chance for the default ext4.All/most of the KV tests, and a few admission-control are switched to volume type/filesystem randomization as requested in this issue.
Epic: none
Fixes: 146661
Release note: None