Measure Latencies and Network Behaviors between different endpoints and protocols
Astrolavos (αστρολάβος) is a tool built to measure latencies and network behaviours between different endpoints.
Given an endpoint astrolavos can run different kind of measurements towards it and expose the metrics in a premetheus format or send them to a Prometheus push gateway.
Astrolavos come from the Greek work αστρολάβος which was a tool for the sailors and astronomers to perform various measurements.
Some might ask why do we need another measuring tool? Aren't there enough out there? The honest answer is yes there are enough out there, probably more than is needed. We couldn't find though what we needed, and initially we needed something that would break the latencies in a HTTP request in similar fashion like httpstat. We started with httptrace measuremnts and we thought this might be used for any measurements really. So here we are with yet another measurement tool that we thing might be useful for the community.
Astrolavos is a basically a loop that spawns go routines to execute the different measurements to the different endpoints. You can specify an endpoint that you want to measure using the config file that astrolavos reads on boot time. The config file is in a yaml format, an example can be found here. Each endpoint entry has the following structure:
- domain: "www.httpbin.org"
interval: 5s
https: true
prober: httptrace
tag: mytag
retries: 3
domain: the IP or domain name that will be usedinterval: the time period in seconds that will be used between the different probe attempts. Default is 5 seconds.prober: the type of the measurement. For now we supporthttptraceandtcp. The default ishttptrace.https: in case ofhttptracemeasurement if we will use TLS or not.tag: the tags that you might want to attach to Prometheus metrics that astrolavos is exposing.retries: how many times to attempt the probe. Default is 1 (single attempt, no retries). For production environments experiencing cluster scaling events, consider increasing to 5+ to handle transient failures gracefully with exponential backoff.
Astrolavos implements exponential backoff retry logic when retries is set to 2 or higher. When a probe fails, it automatically retries with increasing delays (100ms, 200ms, 400ms, etc.) before reporting an error. This can eliminate false positives during cluster scaling events or temporary network disruptions.
Note: The default is retries: 1 (no retry) for immediate failure detection. Increase retries if you need resilience during operational events.
For details on configuring retries to handle cluster changes smoothly, see Smooth Cluster Scaling Guide.
Astrolavos can run either as a server mode, where we expose latency endpoint that another astrolavos deployment can target from different cluster and metrics endpoint that we expose our metrics in prometheus format.
Besides server mode astrolavos can also run in oneoff mode, where it will run given measurements once, send the metrics to a push gateway and exit. This can be useful for a cronjob setup.
After you have built the binary(you can use make build-local for local use) you can run it with just specifying the path of the config file you have ./astrolavos -config-path ./examples.
Astrolavos support also an oneoff mode which you can use by specifying -oneoff flag.
For more info on flags you can use -h flag.
$> ./astrolavos -h
Usage of ./bin/astrolavos:
-config-path string
Specify the path of the config file. (default "/etc/astrolavos")
-oneoff
Run the probe measurements one time and exit.
