Prometheus metrics for libp2p protocols by lla-dane · Pull Request #1199 · libp2p/py-libp2p

lla-dane · 2026-02-09T13:51:39Z

Introduction

This pull request introduces Prometheus/Grafana metrics for core py-libp2p protocols, for real-time monitoring and analysis.

It enables developers to run a libp2p node and directly inspect internal protocol behavior—such as latency, message propagation, and DHT activity—through standard metrics pipelines.

A working demo (metrics-demo) is included in the examples directory, to showcase how multiple services operate together and how their metrics can be visualized using Prometheus and Grafana.

What's included

The following libp2p services are currently instrumented and exposed via Prometheus metrics:

Ping

ping: Round-trip time (RTT) measurements.
ping_failure: Failed ping attempts.

Provides visibility into peer-to-peer latency and connectivity reliability.

Gossipsub / Pubsub

gossipsub_received_total: Messages received
gossipsub_publish_total: Messages published
gossipsub_subopts_total: Subscription updates
gossipsub_control_total: Control messages
gossipsub_message_bytes: Message sizes

Enables monitoring of message propagation, throughput, and pubsub activity.

Kademlia (Kad-DHT)

kad_inbound_total: Total inbound requests
kad_inbound_find_node: FIND_NODE requests
kad_inbound_get_value: GET_VALUE requests
kad_inbound_put_value: PUT_VALUE requests
kad_inbound_get_providers: GET_PROVIDERS requests
kad_inbound_add_provider: ADD_PROVIDER requests

Swarm / Connection Lifecycle

swarm_incoming_conn: Incoming connections
swarm_incoming_conn_error: Incoming connection failures
swarm_dial_attempt: Outgoing dial attempts
swarm_dial_attempt_error: Dial failures

Tracks connection establishment behavior and network stability.

Demo & Observability Setup

A metrics-demo CLI is included to:

Run a libp2p node with Ping, Gossipsub, and Kad-DHT enabled
Connect multiple nodes and observe interactions
Expose metrics via an HTTP endpoint (localhost:8000)

A Docker-based setup is provided to launch:

Prometheus for metrics scraping
Grafana for visualization dashboards

This allows real-time inspection of protocol-level behavior across nodes.

Necessity

Currently, diagnosing issues in py-libp2p (e.g., latency spikes, dropped messages, or DHT inconsistencies) relies heavily on logs, which are:

difficult to aggregate
hard to analyze over time
unsuitable for production observability

This PR introduces structured, queryable metrics that:

enable real-time monitoring
integrate with standard observability tooling
make debugging and performance analysis significantly easier

Reference

Inspired by the metrics design in the Rust implementation:
https://github.com/libp2p/rust-libp2p/tree/master/misc/metrics

lla-dane · 2026-02-18T02:02:20Z

metrics-2026-02-08_19.57.51.mp4

ping latency metrics(Histogram) on grafana

lla-dane · 2026-03-19T14:19:49Z

gossipsub-metrics.mp4

Screencast of the gossipsub metrics. Following metrics are getting recorded:

gossipsub_receiived_total: Messages successfully received
gossipsub_publish_total: Messages to be published
gossipsub_subopts_total: Messages notifying peer subscriptions
gossipsub_control_total: Received control messages
gossipsub_message_bytes: Message size in bytes

seetadev · 2026-03-24T19:46:05Z

@lla-dane : Hi Abhinav, this is a really strong and impactful PR, great work 👏

Love how you’ve brought Prometheus/Grafana observability directly into py-libp2p, the coverage across Ping, Gossipsub, Kad-DHT, and Swarm gives a solid, end-to-end view of protocol behavior. The metrics feel well chosen and immediately useful for debugging and performance analysis.

The metrics-demo + Docker setup is a big win for DX as well, makes it super easy to spin things up and actually see what’s happening across nodes.

Overall, this is a big step toward production-grade observability for py-libp2p. Happy to help test or review further & excited to see this land. We will discuss this in detail tomorrow.

On the same note, wish if you could resolve the CI/CD issues.

seetadev · 2026-03-29T22:33:54Z

@lla-dane : Great work, Abhinav. Please resolve the merge conflicts. Also, add a tracking issue for metrics specific to circuit relay.

Please include a newsfragment. This PR is ready to merge.

lla-dane · 2026-03-30T19:40:32Z

Fixed the merge conflicts and added the newsfragment. @seetadev

pacrob · 2026-03-30T20:06:06Z

libp2p/__init__.py

        return RoutedHost(
            network=swarm,
            router=disc_opt,
            enable_mDNS=enable_mDNS,
            enable_upnp=enable_upnp,
            bootstrap=bootstrap,
            resource_manager=resource_manager,
            bootstrap_allow_ipv6=bootstrap_allow_ipv6,
            bootstrap_dns_timeout=bootstrap_dns_timeout,
            bootstrap_dns_max_retries=bootstrap_dns_max_retries,


should the RoutedHost branch also get metric_recv_channel defined?

I was unsure with this, so attached it to RoutedHost but it seems redundant, I will remove it.

pacrob · 2026-03-30T20:23:04Z

In general, there are a lot of changes being made and code being added with very little testing. Large chunks of the code you're adding could be deleted and we'd never know from the CI run.

It also appears that the docker-compose.yml being added is only for the demo. It should live with the examples, not within the module itself.

…sage in prometheus

lla-dane · 2026-04-02T06:09:42Z

In general, there are a lot of changes being made and code being added with very little testing. Large chunks of the code you're adding could be deleted and we'd never know from the CI run.

Like the major internal change was to add a new component metric_send_channel in INetStream object, for this I had to make changes in a lot of files. After this other one was creating the metric events objects in the respective protocol files, and the metrics module itself. And updating a few tests that were failing because of adding the new component in INetStream.

Would you please flag, 1 or 2 places, so I get the idea of the redundant chunks and then will start removing them. Thanks!! @pacrob

acul71 · 2026-04-02T13:07:12Z

Thanks @lla-dane for the PR — this is strong and impactful work. The protocol coverage (Ping, Gossipsub, Kad-DHT, Swarm), demo setup, and documentation updates make observability much more practical for py-libp2p users.

Required improvements

Fix libp2p/metrics/_init__.py -> libp2p/metrics/__init__.py to avoid packaging/import issues in built distributions.
Clarify and fix metrics behavior for RoutedHost path (disc_opt): either wire metric_recv_channel end-to-end or explicitly reject/disable metrics in this mode.

Suggested improvements

Add focused tests for metrics event plumbing and protocol event emission (stream/pubsub/kad/swarm paths).
Replace print() in metrics startup helper with logger-based info messages for operational consistency.
Add explicit issue linkage in PR description (e.g., Fixes #1199) for traceability.

1. Summary of Changes

This PR adds Prometheus-based observability across core protocols and runtime flows in py-libp2p, including ping, pubsub/gossipsub, Kademlia DHT, and swarm connection lifecycle. It introduces a new metrics module under libp2p/metrics/, wires metric channels through new_host()/new_swarm() and stream interfaces, adds a metrics-demo example stack (including Docker compose under examples/metrics/), and extends docs (docs/examples.metrics.rst, docs/libp2p.metrics.rst) plus a newsfragment (newsfragments/1199.feature.rst).

Related issue context exists as issue #1199 (same title/content as PR), but the PR description does not explicitly link it with Fixes #1199/Closes #1199 language.

No explicit breaking API changes are declared, but interface surface was expanded (INetStream.metric_send_channel, IHost.get_metrics_recv_channel).

2. Branch Sync Status and Merge Conflicts

Branch Sync Status

Status: ℹ️ Ahead of origin/main
Details: 0\t17 in branch_sync_status.txt (0 commits behind, 17 commits ahead)

Merge Conflict Analysis

Conflicts Detected: ✅ No conflicts
Evidence: merge_conflict_check.log shows Already up to date. and === NO MERGE CONFLICTS DETECTED ===

✅ No merge conflicts detected. The PR branch can be merged cleanly into origin/main.

3. Strengths

Broad protocol instrumentation is well-scoped and consistent with the stated observability goals.
Developer experience is improved with a runnable demo (metrics-demo) and accompanying docs.
CI quality gates are currently green in this branch run:
- make lint: passed
- make typecheck: passed
- make test: 2610 passed, 15 skipped, 24 warnings
- make linux-docs: passed with -W (no doc warnings/errors)
Reviewer feedback about moving Docker compose to examples appears addressed (examples/metrics/docker-compose.yml).
Newsfragment was added and named per expected convention (1199.feature.rst).

4. Issues Found

Critical

File: libp2p/metrics/_init__.py
Line(s): file-level
Issue: The metrics package uses _init__.py instead of __init__.py. With current setuptools package discovery ([tool.setuptools.packages.find]), this risks excluding libp2p/metrics from built distributions, causing import/runtime failures for installed packages despite passing local source-tree tests.
Suggestion: Rename libp2p/metrics/_init__.py to libp2p/metrics/__init__.py and verify packaging via build/install smoke test.

Major

RoutedHost vs BasicHost: `metric_recv_channel` not wired

File: libp2p/__init__.py
Line(s): new_host() — RoutedHost vs BasicHost return paths
Issue: enable_metrics=True opens a memory channel pair and passes the send end into the swarm, but only BasicHost receives the receive end. RoutedHost never gets metric_recv_channel, so get_metrics_recv_channel() stays None for DHT/routed setups even though the swarm still holds a send channel.
Suggestion: Plumb metric_recv_channel through RoutedHost.__init__ into super().__init__(..., metric_recv_channel=...), or reject enable_metrics when disc_opt is not None with a clear error or warning.

Channels are created and the swarm is wired:

libp2p/__init__.py — lines 521–560

    # Metric emit/consume endpoints
    metric_send_channel, metric_recv_channel = None, None
    if enable_metrics:
        metric_send_channel, metric_recv_channel = trio.open_memory_channel(100)

    # Enable automatic protection by default: if no resource manager is supplied,
    # create a default instance so connections/streams are guarded out of the box.
    if resource_manager is None:
        try:
            from libp2p.rcmgr import new_resource_manager as _new_rm

            resource_manager = _new_rm()
        except Exception:
            # Fallback to leaving it None if creation fails for any reason.
            resource_manager = None

    # Determine the connection config to use
    # QUIC transport config takes precedence if QUIC is enabled
    effective_config: ConnectionConfig | QUICTransportConfig | None
    if enable_quic and quic_transport_opt is not None:
        effective_config = quic_transport_opt
    else:
        effective_config = connection_config

    swarm = new_swarm(
        enable_quic=enable_quic,
        key_pair=key_pair,
        muxer_opt=muxer_opt,
        sec_opt=sec_opt,
        peerstore_opt=peerstore_opt,
        enable_autotls=enable_autotls,
        muxer_preference=muxer_preference,
        listen_addrs=listen_addrs,
        connection_config=effective_config,
        tls_client_config=tls_client_config,
        tls_server_config=tls_server_config,
        resource_manager=resource_manager,
        psk=psk,
        metric_send_channel=metric_send_channel
    )

The routed path omits metric_recv_channel; the basic path passes it:

libp2p/__init__.py — lines 562–587

    if disc_opt is not None:
        return RoutedHost(
            network=swarm,
            router=disc_opt,
            enable_mDNS=enable_mDNS,
            enable_upnp=enable_upnp,
            bootstrap=bootstrap,
            resource_manager=resource_manager,
            bootstrap_allow_ipv6=bootstrap_allow_ipv6,
            bootstrap_dns_timeout=bootstrap_dns_timeout,
            bootstrap_dns_max_retries=bootstrap_dns_max_retries,
            announce_addrs=announce_addrs,
        )
    return BasicHost(
        network=swarm,
        enable_mDNS=enable_mDNS,
        bootstrap=bootstrap,
        enable_upnp=enable_upnp,
        negotiate_timeout=negotiate_timeout,
        resource_manager=resource_manager,
        metric_recv_channel=metric_recv_channel,
        bootstrap_allow_ipv6=bootstrap_allow_ipv6,
        bootstrap_dns_timeout=bootstrap_dns_timeout,
        bootstrap_dns_max_retries=bootstrap_dns_max_retries,
        announce_addrs=announce_addrs,
    )

RoutedHost does not accept or forward metric_recv_channel:

libp2p/host/routed_host.py — lines 36–77

    def __init__(
        self,
        network: INetworkService,
        router: IPeerRouting,
        enable_mDNS: bool = False,
        enable_upnp: bool = False,
        bootstrap: list[str] | None = None,
        resource_manager: ResourceManager | None = None,
        *,
        bootstrap_allow_ipv6: bool = False,
        bootstrap_dns_timeout: float = 10.0,
        bootstrap_dns_max_retries: int = 3,
        announce_addrs: Sequence[multiaddr.Multiaddr] | None = None,
    ):
        """
        Initialize a RoutedHost instance.

        :param network: Network service implementation
        :param router: Peer routing implementation
        :param enable_mDNS: Enable mDNS discovery
        :param enable_upnp: Enable UPnP port mapping
        :param enable_autotls: Enable AutoTLS certificate provisioning.
        :param bootstrap: Bootstrap peer addresses
        :param resource_manager: Optional resource manager instance
        :type resource_manager: :class:`libp2p.rcmgr.ResourceManager` or None
        :param bootstrap_allow_ipv6: If True, bootstrap uses IPv6+TCP when available.
        :param bootstrap_dns_timeout: DNS resolution timeout in seconds per attempt.
        :param bootstrap_dns_max_retries: Max DNS resolution retries (with backoff).
        :param announce_addrs: If set, replace listen addrs in get_addrs()
        """
        super().__init__(
            network=network,
            enable_mDNS=enable_mDNS,
            enable_upnp=enable_upnp,
            bootstrap=bootstrap,
            resource_manager=resource_manager,
            bootstrap_allow_ipv6=bootstrap_allow_ipv6,
            bootstrap_dns_timeout=bootstrap_dns_timeout,
            bootstrap_dns_max_retries=bootstrap_dns_max_retries,
            announce_addrs=announce_addrs,
        )
        self._router = router

PR body: missing explicit issue closure line

File: PR metadata (PR #1199)
Line(s): PR description (GitHub UI, not in-repo)
Issue: The description mirrors issue #1199 but does not use a closing keyword, so automation and reviewers cannot rely on “linked / closed on merge” from the body alone.
Suggestion: Add one line at the top or bottom of the PR body, for example:

GitHub PR description — suggested line to append

Fixes #1199

(or Closes #1199 per project convention.)

Metrics in hot paths: few tests assert events or plumbing

Files: libp2p/pubsub/pubsub.py, libp2p/kad_dht/kad_dht.py, libp2p/network/swarm.py, stream construction
Line(s): multiple
Issue: Instrumentation runs on every RPC read, DHT inbound handling, and dial/listen paths, but there is no dedicated test module that opens a channel, drives one protocol action, and asserts the shape/count of events on the receive side.
Suggestion: Add tests that (1) assert NetStream.metric_send_channel is set when enable_metrics=True, (2) drive a minimal gossipsub RPC and assert a GossipsubEvent on the receive channel, (3) trigger dial_peer failure/success and assert SwarmEvent fields, (4) exercise DHT handle_stream and assert KadDhtEvent.

Pubsub: one GossipsubEvent per RPC loop iteration, sent if the stream has a channel:

libp2p/pubsub/pubsub.py — lines 512–558

                event = GossipsubEvent()
                event.peer_id = peer_id.pretty()
                event.message_size = len(incoming)

                if rpc_incoming.publish:
                    # deal with RPC.publish
                    event.publish = True
                    for msg in rpc_incoming.publish:
                        if not self._is_subscribed_to_msg(msg):
                            continue
                        logger.debug(
                            "received `publish` message %s from peer %s", msg, peer_id
                        )
                        # Only schedule task if service is still running
                        if self.manager.is_running:
                            self.manager.run_task(self.push_msg, peer_id, msg)

                if rpc_incoming.subscriptions:
                    # deal with RPC.subscriptions
                    # We don't need to relay the subscription to our
                    # peers because a given node only needs its peers
                    # to know that it is subscribed to the topic (doesn't
                    # need everyone to know)
                    event.subopts = True
                    for message in rpc_incoming.subscriptions:
                        logger.debug(
                            "received `subscriptions` message %s from peer %s",
                            message,
                            peer_id,
                        )
                        self.handle_subscription(peer_id, message)

                # NOTE: Check if `rpc_incoming.control` is set through `HasField`.
                #   This is necessary because `control` is an optional field in pb2.
                #   Ref: https://developers.google.com/protocol-buffers/docs/reference/python-generated#singular-fields-proto2  # noqa: E501
                if rpc_incoming.HasField("control"):
                    event.control = True
                    # Pass rpc to router so router could perform custom logic
                    logger.debug(
                        "received `control` message %s from peer %s",
                        rpc_incoming.control,
                        peer_id,
                    )
                    await self.router.handle_rpc(rpc_incoming, peer_id)

                if stream.metric_send_channel is not None:
                    await stream.metric_send_channel.send(event)

Swarm: dial attempt and failure paths emit SwarmEvent on the swarm-level channel:

libp2p/network/swarm.py — lines 494–572

        # Emit metric-event for dial-attempt
        event = SwarmEvent()
        event.peer_id = peer_id.pretty()
        event.dial_attempt = True

        if self.metric_send_channel is not None:
            await self.metric_send_channel.send(event)

        # Check if we already have connections
        existing_connections = self.get_connections(peer_id)
        if existing_connections:
            # Filter out closed connections
            valid_connections = [c for c in existing_connections if not c.is_closed]
            if valid_connections:
                logger.debug(f"Reusing existing connections to peer {peer_id}")
                return valid_connections

        logger.debug("attempting to dial peer %s", peer_id)

        try:
            # Get peer info from peer store
            addrs = self.peerstore.addrs(peer_id)
        except PeerStoreError as error:
            raise SwarmException(f"No known addresses to peer {peer_id}") from error

        if not addrs:
            raise SwarmException(f"No known addresses to peer {peer_id}")

        # Filter addresses through connection gate (InterceptAddrDial)
        gate = self.connection_gate
        allowed_addrs = []
        for addr in addrs:
            if await gate.is_allowed(addr):
                allowed_addrs.append(addr)

        if not allowed_addrs:
            raise SwarmException(
                f"All addresses for peer {peer_id} blocked by connection gate"
            )

        connections = []
        exceptions: list[SwarmException] = []

        # Try all allowed addresses with retry logic
        for multiaddr in allowed_addrs:
            try:
                connection = await self._dial_with_retry(multiaddr, peer_id)
                connections.append(connection)

                # Limit number of connections per peer
                if len(connections) >= self.connection_config.max_connections_per_peer:
                    break

            except SwarmException as e:
                exceptions.append(e)
                logger.debug(
                    "encountered swarm exception when trying to connect to %s, "
                    "trying next address...",
                    multiaddr,
                    exc_info=e,
                )

        if not connections:
            # Tried all addresses, raising exception.

            # Emit metric-event for dial_attempt failure
            event = SwarmEvent()
            event.peer_id = peer_id.pretty()
            event.dial_attempt_error = True

            if self.metric_send_channel is not None:
                await self.metric_send_channel.send(event)

            raise SwarmDialAllFailedError(
                f"unable to connect to {peer_id}, no addresses established a "
                "successful connection (with exceptions)",
                peer_id=peer_id,
                num_addrs_tried=len(exceptions),
            ) from MultiError(exceptions)

        return connections

Kad-DHT: after handling an inbound message, events go out on the stream’s channel:

libp2p/kad_dht/kad_dht.py — lines 880–882

            if stream.metric_send_channel is not None:
                await stream.metric_send_channel.send(event)

(That block sits after the main try/except around protobuf handling in handle_stream.)

Minor

`print()` in Prometheus startup helper

File: libp2p/metrics/metrics.py
Line(s): start_prometheus_server()
Issue: User-facing startup hints use print() instead of the project’s logging pattern, so log level, formatters, and production log aggregation do not apply.
Suggestion: Use logger = logging.getLogger(__name__) and logger.info(...) for the same messages (or document that this helper is CLI-only and keep prints behind an explicit verbose flag).

libp2p/metrics/metrics.py — lines 42–60

    async def start_prometheus_server(
        self,
        metric_recv_channel: trio.MemoryReceiveChannel[Any],
    ) -> None:
        metrics = find_available_port(8000)
        prometheus = find_available_port(9000)
        grafana = find_available_port(7000)

        start_http_server(metrics)

        print(f"\nPrometheus metrics visible at: http://localhost:{metrics}")

        print(
            "\nTo start prometheus and grafana dashboards, from another terminal: \n"
            f"PROMETHEUS_PORT={prometheus} GRAFANA_PORT={grafana} docker compose up\n"
            "\nAfter this:\n"
            f"Prometheus dashboard will be visible at: http://localhost:{prometheus}\n"
            f"Grafana dashboard will be visible at: http://localhost:{grafana}\n"
        )

5. Security Review

No direct critical security vulnerability was identified in this diff.

Potential considerations:

Risk: Metrics endpoint exposure could unintentionally leak operational metadata if bound publicly in production.
Impact: Low to Medium (environment-dependent).
Mitigation: Document secure deployment defaults (bind to localhost/restricted interfaces, firewall guidance, auth proxy if needed).

6. Documentation and Examples

Documentation and example coverage is good and materially improved:

Added dedicated docs pages and index wiring for metrics.
Added runnable examples/metrics with README and orchestration.

One documentation gap remains:

Clarify routed-host metrics behavior (disc_opt path) if metrics are unsupported or partially supported there.

7. Newsfragment Requirement

✅ Newsfragment present: newsfragments/1199.feature.rst
✅ Filename format looks valid (<ISSUE_NUMBER>.<TYPE>.rst)
⚠️ PR body should still explicitly reference the issue (Fixes #1199 etc.) to satisfy issue-linking policy in a clear/auditable way.

8. Tests and Validation

Lint (`make lint`)

Result: ✅ Passed
Findings: No errors/warnings in command output.

Typecheck (`make typecheck`)

Result: ✅ Passed
Findings: No type errors/warnings in command output.

Test (`make test`)

Result: ✅ Passed
Summary: 2610 passed, 15 skipped, 24 warnings in 97.43s
Warnings observed:
- PytestCollectionWarning from tests/core/records/test_ipns_validator.py (TestVector class collection), repeated in warning summary.

Docs (`make linux-docs`)

Result: ✅ Passed
Sphinx invoked with -W; no doc build warnings/errors failed the build.

9. Recommendations for Improvement

Fix package initialization file naming (__init__.py) for libp2p/metrics before merge.
Make routed-host metrics behavior explicit and consistent (wire channel or fail fast).
Add direct tests for metric-event emission/plumbing in protocol paths.
Add explicit issue-closing reference in PR body (Fixes #1199).
Consider logger-based startup messaging in metrics runtime module.

10. Questions for the Author

Is metrics support intended to work for RoutedHost (disc_opt path), or should it be unsupported by design?
Was packaging/install behavior validated for libp2p.metrics outside editable/source mode (wheel install)?
Can you add at least a small protocol-level test set proving metric events are emitted for ping/pubsub/kad/swarm paths?
Will you add explicit issue linkage (Fixes #1199) in the PR body for merge policy compliance?

11. Overall Assessment

Quality Rating: Needs Work
Security Impact: Low
Merge Readiness: Needs fixes
Confidence: High

lla-dane · 2026-04-02T13:48:24Z

Sure sure, thanks for flagging the issues @acul71, will fix them shortly.

pacrob · 2026-04-02T15:40:10Z

Would you please flag, 1 or 2 places, so I get the idea of the redundant chunks and then will start removing them. Thanks!! @pacrob

Sorry, I was unclear. It's not that your code is redundant. Because your code is not hit when tests are run, there's no way for us to know if some future PR changes or breaks the work you've done here.

lla-dane · 2026-04-02T16:24:16Z

Sorry, I was unclear. It's not that your code is redundant. Because your code is not hit when tests are run, there's no way for us to know if some future PR changes or breaks the work you've done here.

Aah I see, I misunderstood. I will start writing tests so that all of my code is included in the CI runs for future PRs.

lla-dane force-pushed the metrics branch from 8724ca0 to 5d6c84f Compare February 27, 2026 09:13

lla-dane force-pushed the metrics branch from 379c323 to 21484f0 Compare March 11, 2026 09:32

lla-dane force-pushed the metrics branch 4 times, most recently from 3ab8490 to 1592d66 Compare March 22, 2026 15:39

lla-dane marked this pull request as ready for review March 23, 2026 05:04

lla-dane changed the title ~~WIP: Prometheus metrics for libp2p protocols~~ Prometheus metrics for libp2p protocols Mar 23, 2026

lla-dane force-pushed the metrics branch 2 times, most recently from f9d9854 to 66fd7d6 Compare March 26, 2026 15:04

lla-dane force-pushed the metrics branch from 66fd7d6 to 0cb59ee Compare March 30, 2026 19:38

pacrob reviewed Mar 30, 2026

View reviewed changes

lla-dane added 12 commits April 2, 2026 11:35

feat: ping latency metrics

8a52608

feat: Attached promtheus/grafana services with docker

aa97194

feat: gossipsub metrics infra

3ba7696

feat: dcutr metrics infra

394a074

chore: fix formatting

2ca1cd0

feat: relay and kad-dht metrics

a68ce1c

chore: fix formatting

d55eb66

feat: swarm metrics and per-protocol inbound and outbound bandwidth u…

c15df25

…sage in prometheus

chore: fix formatting

479e7be

feat: integrated gossipsub metrics with cli runtime

42737c6

feat: fixed bugs in kad-dht metrics code

c20af67

fix: kad-dht metrics working now

6b1177a

lla-dane added 5 commits April 2, 2026 11:35

feat: added metrics for swarm-connection cycle

4946dc7

chore: fixed all linter errors

aa4e129

added newsfragment file

8355024

chore: remove redundancies

5c019f9

migrate docker-compose file to examples/metrics

26017c1

lla-dane force-pushed the metrics branch from 0cb59ee to 26017c1 Compare April 2, 2026 06:05

Conversation

lla-dane commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduction

What's included

Ping

Gossipsub / Pubsub

Kademlia (Kad-DHT)

Swarm / Connection Lifecycle

Demo & Observability Setup

Necessity

Reference

Uh oh!

lla-dane commented Feb 18, 2026

Uh oh!

lla-dane commented Mar 19, 2026

Uh oh!

seetadev commented Mar 24, 2026

Uh oh!

seetadev commented Mar 29, 2026

Uh oh!

lla-dane commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pacrob Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lla-dane Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

pacrob commented Mar 30, 2026

Uh oh!

lla-dane commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acul71 commented Apr 2, 2026

Required improvements

Suggested improvements

1. Summary of Changes

2. Branch Sync Status and Merge Conflicts

Branch Sync Status

Merge Conflict Analysis

3. Strengths

4. Issues Found

Critical

Major

RoutedHost vs BasicHost: metric_recv_channel not wired

PR body: missing explicit issue closure line

Metrics in hot paths: few tests assert events or plumbing

Minor

print() in Prometheus startup helper

5. Security Review

6. Documentation and Examples

7. Newsfragment Requirement

8. Tests and Validation

Lint (make lint)

Typecheck (make typecheck)

Test (make test)

Docs (make linux-docs)

9. Recommendations for Improvement

10. Questions for the Author

11. Overall Assessment

Uh oh!

lla-dane commented Apr 2, 2026

Uh oh!

pacrob commented Apr 2, 2026

Uh oh!

lla-dane commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lla-dane commented Feb 9, 2026 •

edited

Loading

lla-dane commented Mar 30, 2026 •

edited

Loading

pacrob Mar 30, 2026 •

edited

Loading

lla-dane commented Apr 2, 2026 •

edited

Loading

RoutedHost vs BasicHost: `metric_recv_channel` not wired

`print()` in Prometheus startup helper

Lint (`make lint`)

Typecheck (`make typecheck`)

Test (`make test`)

Docs (`make linux-docs`)

lla-dane commented Apr 2, 2026 •

edited

Loading