This module provides an abstraction named SignalStorage, as well as default implementations for each signal type that allow writing signals to disk and reading them later.
For a more detailed information on how the whole process works, take a look at the DESIGN.md file.
The default implementations are the following:
We need to create a signal storage object per signal type to start writing signals to disk. Each
File*Storage implementation has a create() function that receives:
- A File directory to store the signal files. Note that each signal storage object must have a dedicated directory to work properly.
- (Optional) a configuration object.
The available configuration parameters are the following:
- Max file size, defaults to 1MB.
- Max folder size, defaults to 10MB.
- Max age for file writing. It sets the time window where a file can get signals appended to it. Defaults to 30 seconds.
- Min age for file reading. It sets the time to wait before starting to read from a file after its creation. Defaults to 33 seconds. It must be greater that the max age for file writing.
- Max age for file reading. After that time passes, the file will be considered stale and will be removed when new files are created. No more data will be read from a file past this time. Defaults to 18 hours.
- Delete items on iteration. Controls whether items are automatically removed from disk as the
iterator advances. Defaults to
true. See Deleting data for more details.
// Root dir
File rootDir = new File("/some/root");
// Setting up span storage
SignalStorage.Span spanStorage = FileSpanStorage.create(new File(rootDir, "spans"));
// Setting up metric storage
SignalStorage.Metric metricStorage = FileMetricStorage.create(new File(rootDir, "metrics"));
// Setting up log storage
SignalStorage.LogRecord logStorage = FileLogRecordStorage.create(new File(rootDir, "logs"));While you could manually call your SignalStorage.write(items) function, disk buffering
provides convenience exporters that you can use in your OpenTelemetry's instance, so
that all signals are automatically stored as they are created.
- For a span storage, use a SpanToDiskExporter.
- For a log storage, use a LogRecordToDiskExporter.
- For a metric storage, use a MetricToDiskExporter.
Each will wrap a signal storage for its respective signal type, as well as an optional callback to notify when it succeeds, fails, and gets shutdown.
// Setting up span to disk exporter
SpanToDiskExporter spanToDiskExporter =
SpanToDiskExporter.builder(spanStorage).setExporterCallback(spanCallback).build();
// Setting up metric to disk
MetricToDiskExporter metricToDiskExporter =
MetricToDiskExporter.builder(metricStorage).setExporterCallback(metricCallback).build();
// Setting up log to disk exporter
LogRecordToDiskExporter logToDiskExporter =
LogRecordToDiskExporter.builder(logStorage).setExporterCallback(logCallback).build();
// Using exporters in your OpenTelemetry instance.
OpenTelemetry openTelemetry =
OpenTelemetrySdk.builder()
// Using span to disk exporter
.setTracerProvider(
SdkTracerProvider.builder()
.addSpanProcessor(BatchSpanProcessor.builder(spanToDiskExporter).build())
.build())
// Using log to disk exporter
.setLoggerProvider(
SdkLoggerProvider.builder()
.addLogRecordProcessor(
BatchLogRecordProcessor.builder(logToDiskExporter).build())
.build())
// Using metric to disk exporter
.setMeterProvider(
SdkMeterProvider.builder()
.registerMetricReader(PeriodicMetricReader.create(metricToDiskExporter))
.build())
.build();Now when creating signals using your OpenTelemetry instance, those will get stored in disk.
In order to read data, we can iterate through our signal storage objects and then forward them to a network exporter. By default, items are automatically deleted from disk as the iterator advances, so a simple iteration is all that's needed:
/**
* Example of reading and exporting spans from disk.
*
* @return true, if the exporting was successful, false, if it needs to be retried
*/
public boolean exportSpansFromDisk(SpanExporter networkExporter, long timeout) {
for (Collection<SpanData> spanData : spanStorage) {
CompletableResultCode resultCode = networkExporter.export(spanData);
resultCode.join(timeout, TimeUnit.MILLISECONDS);
if (!resultCode.isSuccess()) {
logger.trace("Error while exporting", resultCode.getFailureThrowable());
// The iteration should be aborted here to avoid consuming batches, which were not exported successfully
return false;
}
}
logger.trace("Finished exporting");
return true;
}By default, items are automatically deleted from disk as the iterator advances. You can also
clear all data at once by calling SignalStorage.clear().
The default behavior (deleteItemsOnIteration = true) automatically removes items from disk during
iteration. This means you don't need to call Iterator.remove() since the data is cleaned up as the
iterator advances.
If you need more control (e.g., only deleting items after a successful network export), set
deleteItemsOnIteration to false in the configuration:
FileStorageConfiguration config = FileStorageConfiguration.builder()
.setDeleteItemsOnIteration(false)
.build();
SignalStorage.Span spanStorage = FileSpanStorage.create(new File(rootDir, "spans"), config);With this setting, items remain on disk until explicitly removed via Iterator.remove():
public boolean exportSpansFromDisk(SpanExporter networkExporter, long timeout) {
Iterator<Collection<SpanData>> spansIterator = spanStorage.iterator();
while (spansIterator.hasNext()) {
CompletableResultCode resultCode = networkExporter.export(spansIterator.next());
resultCode.join(timeout, TimeUnit.MILLISECONDS);
if (resultCode.isSuccess()) {
spansIterator.remove();
} else {
return false;
}
}
return true;
}Note that even with explicit deletion, disk usage is still bounded by the configured max folder size and max file age, so stale files are automatically purged when there's not enough space available before new data is written.
Both the writing and reading processes can run in parallel as they won't overlap because each is supposed to happen in different files. We ensure that reader and writer don't accidentally meet in the same file by using the configurable parameters. These parameters set non-overlapping time frames for each action to be done on a single file at a time. On top of that, there's a mechanism in place to avoid overlapping on edge cases where the time frames ended but the resources haven't been released. For that mechanism to work properly, this tool assumes that both the reading and the writing actions are executed within the same application process.
- Cesar Munoz, Elastic
- Jason Plumb, Splunk
Learn more about component owners in component_owners.yml.