Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 183 additions & 0 deletions doc/developer/design/20260106_faster_introspection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# Faster introspection

- Associated: (Insert list of associated epics, issues, or PRs)

<!--
The goal of a design document is to thoroughly discover problems and
examine potential solutions before moving into the delivery phase of
a project. In order to be ready to share, a design document must address
the questions in each of the following sections. Any additional content
is at the discretion of the author.

Note: Feel free to add or remove sections as needed. However, most design
docs should at least keep the suggested sections.
-->

## Problem

<!--
What is the user problem we want to solve?

The answer to this question should link to at least one open GitHub
issue describing the problem.
-->

In Materialize, cluster replicas host the user's workload.
Users observe the status of their workloads via introspection queries that gather information from various parts of the system.
Currently, the queries can be slow or unresponsive when the system is under load.
In the limit, this makes it impossible to observe progress for workloads that never hydrate, and slow for others.

In this design, we want to outline alternatives for improving the performance of introspection queries.
Specifically, we want to improve the performance of queries against _compute introspection_, which present as indexes maintained by the compute layer itself.

## Success criteria

<!--
What does a solution to this problem need to accomplish in order to
be successful?

The criteria should help us verify that a proposed solution would solve
our problem without naming a specific solution. Instead, focus on the
outcomes we hope result from this work. Feel free to list both qualitative
and quantitative measurements.
-->

Users can query introspection data with minimal delay, even when the system is under load.

## Out of Scope

<!--
What does a solution to this problem not need to address in order to be
successful?

It's important to be clear about what parts of a problem we won't be solving
and why. This leads to crisper designs, and it aids in focusing the reviewer.
-->

## Solution Proposal

<!--
What is your preferred solution, and why have you chosen it over the
alternatives? Start this section with a brief, high-level summary.

This is your opportunity to clearly communicate your chosen design. For any
design document, the appropriate level of technical details depends both on
the target reviewers and the nature of the design that is being proposed.
A good rule of thumb is that you should strive for the minimum level of
detail that fully communicates the proposal to your reviewers. If you're
unsure, reach out to your manager for help.

Remember to document any dependencies that may need to break or change as a
result of this work.
-->

In compute, we collect introspection data through Timely's logging mechanism, and replay the data on the same worker that generated it.
This means that when a worker is busy, it cannot process new introspection data or respond to queries.
We cannot preempt operators as Timely follows a cooperative multitasking model.

This implies that we need to separate the introspection data processing from the main query processing path, and move it to a different timely runtime.
This way, even if the main runtime is busy, the introspection data can still be processed and queries can be answered.

* Reserve workers: We can reserve a subset of workers in the cluster for processing introspection data.
These workers would run a separate Timely runtime that only processes introspection data.
This allows us to process the data within the same process, avoiding the need for a separate network connection.
A downside is that not all workers are available for query processing.
This might be justifiable for large replicas, though.
* Sidecar cluster replica: We can run a separate cluster replica that only processes introspection data.
This replica would run a separate Timely runtime that only processes introspection data.
The main cluster replica would send introspection data to the sidecar replica.
An upside is that the original cluster replica would still have all compute resources available as it has at the moment.
A downside is that we introduce a network connection, which are a source of runtime errors.

The separate cluster could maintain the introspection data as indexes, or write it to persist if desired.
This part of the design is orthogonal to the main proposal, and can be decided based on the desired trade-offs.

Separating the introspection data processing into a separate timely runtime has implications on how we can query the data.
Currently, introspection queries are executed in the same cluster replica as the user's workload.
With the proposed design, we need to route introspection queries to a separate cluster replica.

A separate introspection cluster could be used by multiple replicas to process their introspection data.

## Minimal Viable Prototype

<!--
Build and share the minimal viable version of your project to validate the
design, value, and user experience. Depending on the project, your prototype
might look like:

- A Figma wireframe, or fuller prototype
- SQL syntax that isn't actually attached to anything on the backend
- A hacky but working live demo of a solution running on your laptop or in a
staging environment

The best prototypes will be validated by Materialize team members as well
as prospects and customers. If you want help getting your prototype in front
of external folks, reach out to the Product team in #product.

This step is crucial for de-risking the design as early as possible and a
prototype is required in most cases. In _some_ cases it can be beneficial to
get eyes on the initial proposal without a prototype. If you think that
there is a good reason for skipping or delaying the prototype, please
explicitly mention it in this section and provide details on why you'd
like to skip or delay it.
-->

For an MVP, I would prefer to reserve a low number of threads for a separate Timely runtime that is dedicated to processing introspection data, but runs within the same pod.
This allows us to avoid any network overhead and orchestration complexity, which we'd need to solve otherwise.
We can make this opt-in, and could limit it to larger cluster sizes where losing a CPU core would not have a large impact.

All mechanisms that externalize introspection data need the same interface changes, regardless of where we host the data.
This means that even for co-locating introspection data processing on the same pod, we need the same abstraction boundaries as if the introspection data was processed on a different pod.

Specifically, we do the following:
* We introduce a separate Timely runtime for introspection.
It uses the same compute protocol as the compute runtime, and has the same capabilities.
* We introduce a mechanism to externalize the introspection data from a Timely run time, and ingesting it in a separate instance.
For the co-located implementation, we can use shared memory as our transport.
* We change the coordinator to surface a separate, non-modifiable, introspection cluster along with the regular cluster.


### Sidecar clusters

This command crates a new replica and a separate introspection sidecar replica.
```SQL
CREATE CLUSTER REPLICA cluster_name.replica_name (INTROSPECTION SIDECAR = COLOCATED, INTROSPECTION SIDECAR NAME = introspection_cluster_name);
```

We require the sidecar to be a separate _cluster_, and not just a separate _cluster replica_.
Otherwise, users would need to use replica-targeted queries to obtain the relevant information.

### Sidecar cluster replicas

This command crates a new replica and a separate introspection sidecar replica.
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'crates' to 'creates'.

Copilot uses AI. Check for mistakes.
```SQL
CREATE CLUSTER REPLICA cluster_name.replica_name (INTROSPECTION SIDECAR = COLOCATED, INTROSPECTION SIDECAR NAME = introspection_cluster.introspection_replica_name);
```

## Alternatives

<!--
What other solutions were considered, and why weren't they chosen?

This is your chance to demonstrate that you've fully discovered the problem.
Alternative solutions can come from many places, like: you or your Materialize
team members, our customers, our prospects, academic research, prior art, or
competitive research. One of our company values is to "do the reading" and
to "write things down." This is your opportunity to demonstrate both!
-->

## Open questions

<!--
What is left unaddressed by this design document that needs to be
closed out?

When a design document is authored and shared, there might still be
open questions that need to be explored. Through the design document
process, you are responsible for getting answers to these open
questions. All open questions should be answered by the time a design
document is merged.
-->

* Should we maintain the introspection indexes on both the sidecar and the main cluster?
* What's the name of the introspection sidecar?