Skip to content

RFC: support embeded global (and local?) topology serviceΒ #18983

@timvaillancourt

Description

@timvaillancourt

Problem

As a distributed database, Vitess must store various cluster metadata to map out where shards, tablets, etc are. We call this the "Topology Service"

Vitess has 2 x types of "topology services":

  • Global - global/cross-cell metadata
  • Local - per-cell metadata (mostly tablet records)

The Vitess topology service is implemented by external, consistent KV/stores such as etcd, Consul and Zookeeper and is accessed via the go/vt/topo library (kind of a client-wrapper to the KV store implementation). Vitess also has a "control-plane"-like component, VTCtld, which importantly (for the RFC) are daemons ran separate from the "Topology Service"

Similar to Vitess, some competing databases also store cluster metadata in a centralized place:

  1. MongoDB - MongoDB calls these servers "Config Servers"
  2. TiDB - TiDB calls these servers "Placement Drivers"
  3. ...

For these competing databases, the cluster control-plane/metadata service is a single daemon, single component. Ie: both the cluster "state" and the "control plane" are a single component to install, manage, debug

In contrast, as already mentioned, Vitess stores state in 2 x different topology clusters (global/local) and the "control-plane" is yet another cluster. 3+ clusters to store some state/control stuff πŸ˜…. While this approach may have some benefits, a valid criticism of Vitess (my opinion) is it's complicated to install, especially if everything isn't glued-together by a Kube operator

If you draw-out a cluster of those competitors on paper, it's really just:

  1. N x Routers
  2. 3-ish x Cluster metadata/control servers
  3. N x Shards

But if you draw-out Vitess on paper, it's:

  1. N x Routers
  2. 3-ish x Cluster control servers
  3. 3-ish x Global Topology Service
  4. N x (3-ish x Local Topology Service)
  5. N x Shards
  6. N x VTOrc (ignoring this for now)
  7. (potentially) VTAdmin (ignoring this for now)
    πŸ˜…

Proposal

I feel the de-facto/default Topology Service implementation for Vitess is etcd. etcd is a golang project that supports embedding itself in other golang applications, including with the high-availability features the project provides

What this RFC proposes is simple: support running the global topo embeded in the VTCtld process, a process that is already recommended to be deployed/distributed in the same ways (1 per cell, etc) as the currently-external-only global topology service itself

The title of this RFC includes the "(and local?)". Storing local topology state in this VTCtld-embeded store would make it global, or at least it would break some guarantees that cells provide, so the topic of moving local topo data as well is avoided for now, but it's also worth discussing. Because Vitess uses strongly-consistent reads, this isn't easy to do yet, but if we supported "stale" topo-reads, we could at least read local-cell topo records during a network partition/loss-of-quorum, but that's an RFC for another day

Use Case(s)

Users of Vitess that prefer reduced deployment complexity, fewer components to run, etc

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions