Skip to content

Evaluate & Support using dedicated disks for WAL to mitigate IOPS contention #1147

@unmarshall

Description

@unmarshall

How to categorize this issue?

/area control-plane
/area performance
/area scalability
/kind enhancement

What would you like to be added:
Today etcd-druid deploys an etcd cluster with a single SSD that is shared to store WAL and snapshot files. All these SSDs come with IOPS limits. For clusters where the etcd read/write activity is LOT, there is possibility that etcd slows down significantly which then causes timeouts from kube-apiserver.

Trace[1428471700]:  ---"Txn call failed" err:etcdserver: request timed out 7015ms (06:23:13.521)]
E0729 06:23:13.532618       1 status.go:71] "Unhandled Error" err="apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:\"etcdserver: request timed out\"}: etcdserver: request timed out" logger="UnhandledError"

Details of one such occurrence can be seen in Live Issue#7539.

It is a recommendation from upstream etcd to have dedicated disk for WAL (https://etcd.io/docs/v2.3/admin_guide/). Since these SSDs have an associated cost this should be made configurable via Etcd resource.

Why is this needed:
etcd clusters rely heavily on extremely fast SSDs and their response times are sensitive to disk performance. For large/busy etcd clusters the IOPS can easily cross the limits for the SSD used. In order to prevent timeouts from the kube-apiserver which results in an outage it is essential to provide an option to use multiple SSDs by individual etcd members.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/control-planeControl plane relatedarea/performancePerformance (across all domains, such as control plane, networking, storage, etc.) relatedarea/scalabilityScalability relatedkind/enhancementEnhancement, improvement, extension

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions