Fix/traffic issue optimizations#663
Fix/traffic issue optimizations#663moehamade wants to merge 5 commits intopermissionlesstech:mainfrom
Conversation
|
This is awesome @moehamade. Following our discussion there is another potential area of improvement:
|
This commit introduces a robust diagnostics and metrics reporting system for the Nostr client to help monitor performance, detect issues like subscription leaks, and debug data usage. The new system is split into two main components: - **`NostrMetricsCollector`**: A thread-safe class dedicated to collecting detailed metrics, including bytes sent/received per relay, event counts per subscription, reconnection counts, and overall uptime. - **`NostrDiagnosticsReporter`**: A new class that uses the collected metrics to generate comprehensive reports and perform periodic health checks. Key changes: - **Comprehensive Reporting**: Adds the ability to generate a detailed diagnostics report (`generateDiagnosticsReport()`) covering data transfer, subscription age, leak detection, top bandwidth-consuming relays, and health recommendations. - **Periodic Health Checks**: Implements an optional, periodic health check that logs a summary of key metrics and warns about potential issues like old subscriptions or high bandwidth usage. - **Detailed Subscription Logging**: When diagnostic logging is enabled, creating and closing subscriptions now logs detailed information, including a stack trace, to help identify the origin of subscriptions and debug leaks. - **Refactoring for Separation of Concerns**: Logic for metrics collection and reporting has been moved out of `NostrRelayManager` and into the new dedicated classes, cleaning up the manager. `NostrRelayManager` now delegates these tasks. - **Subscription Consistency Fix**: The subscription validation routine is improved to not only re-subscribe to missing subscriptions but also to actively clean up (un-subscribe) extra or orphaned subscriptions found on a relay, improving robustness. - **Dynamic Relay Count**: Introduces a helper function (`optimalRelayCount`) to dynamically determine the optimal number of relays to connect to based on a geohash's precision, aiming to balance coverage and resource usage.
This commit refactors the Nostr relay selection logic to be dynamic based on geohash precision, improving network efficiency. It also introduces diagnostic logging for the Nostr relay manager in debug builds. Key changes: - **Dynamic Relay Count**: Introduces `NostrRelayManager.optimalRelayCount(geohash)` to determine the number of relays to use based on the length of the geohash. This replaces a hardcoded value of 5, allowing for more targeted and efficient event propagation. - **Optimized Subscriptions**: All subscriptions and event publications for geohashes now use this dynamic relay count, reducing unnecessary network traffic for lower-precision (larger area) geohashes. - **Debug Logging**: Enables diagnostic logging for `NostrRelayManager` in `DEBUG` builds, making it easier to troubleshoot relay communication issues during development.
bde6913 to
6071c20
Compare
This commit refactors the geohash subscription logic to separate chat messages (kind 20000) from presence heartbeats (kind 20001). This allows the app to manage them independently, improving efficiency and reducing background data usage. Key changes: - **Separated Subscription Logic**: `NostrSubscriptionManager` and `NostrFilter` now have dedicated functions for subscribing to geohash messages and presence events separately. - **Lifecycle-Aware Presence**: The `GeohashViewModel` now only subscribes to presence events when the app is in the foreground, pausing the subscription when the app is backgrounded to save data. The message subscription remains active. - **Fixed Relay Count**: The dynamic relay count based on geohash precision has been replaced with a fixed count of 3 for all geohash subscriptions to balance reliability and bandwidth. - **Improved Cleanup**: The `reset()` function in `GeohashViewModel` is now more robust, ensuring all subscriptions and background jobs are cleared correctly before re-initialization.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7043fe5f16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| private const val MAX_RECONNECT_ATTEMPTS = com.bitchat.android.util.AppConstants.Nostr.MAX_RECONNECT_ATTEMPTS | ||
|
|
||
|
|
||
| fun optimalRelayCount(geohash: String): Int = 3 |
There was a problem hiding this comment.
Implement geohash-aware relay sizing
optimalRelayCount now drives every geohash send/subscribe path, but this implementation ignores its geohash input and always returns 3. That under-connects narrow geohashes (reduced coverage and higher chance of missed low-volume events) while over-connecting broad geohashes (extra bandwidth), so the adaptive routing behavior described by the surrounding call sites never actually happens.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
The flat 3 is intentional. The previous scheme (2/3/5) was counterproductive: 5 relays for narrow geohashes added duplicate downloads with negligible coverage benefit (few messages, nearby reliable relays), while 2 for broad geohashes was fragile (1 relay down = 50% loss, no failover). Relay selection is deterministic per geohash (all users get the same set via closestRelaysForGeohash), so relay count doesn't affect cross-user message delivery. The function still accepts the geohash parameter so it can be made adaptive again in the future if needed.
This commit ensures that `currentActiveGeohash` is cleared when unsubscribing from all channels. This prevents stale data from being carried over when the user switches to a new channel.
| handler: (NostrEvent) -> Unit, | ||
| includeDefaults: Boolean = false, | ||
| nRelays: Int = 5 | ||
| nRelays: Int = 5 |
There was a problem hiding this comment.
I didn't notice since I didn't edit that part but I'll fix
| Log.d(TAG, "🌍 App backgrounded: Pausing sampling, presence heartbeat, and presence subscription") | ||
| activeSamplingGeohashes.forEach { subscriptionManager.unsubscribe("sampling-$it") } | ||
| globalPresenceJob?.cancel() | ||
| globalPresenceJob = null | ||
| currentPresenceSubId?.let { subscriptionManager.unsubscribe(it); currentPresenceSubId = null } | ||
| } |
There was a problem hiding this comment.
We discussed wanting to pause kind 20001 subscriptions, not the heartbeats coming from our device. I think we want the heartbeat to go on even on background. Here the comment says "presence heartbeat" but anyway I think this is only pausing the samplings and it's just the comment that is misleading.
There was a problem hiding this comment.
I was pausing both, will revert the heartbeat pausing so we only pause Kind 20001 in currentPresenceSubId?.let { subscriptionManager.unsubscribe(it); currentPresenceSubId = null }
This commit removes the explicit start and stop of the global presence heartbeat from the `GeohashViewModel`'s lifecycle `onStart` and `onStop` callbacks. The `globalPresenceJob` is no longer cancelled when the app is backgrounded, simplifying the lifecycle management logic. This suggests the heartbeat's lifecycle is now managed elsewhere. Diagnostic log messages have been updated to reflect this change.
|
ACK @4942b71 . Untested. |
Nostr Traffic Optimization
Changes
Use 3 relays for all geohash subscriptions — balances reliability (survives 1 relay down) with bandwidth (limits duplicate event downloads)
Pause presence heartbeat when backgrounded —
globalPresenceJobwas running 24/7 broadcasting to all channels even when the app was not visible. Now cancelled on background, restarted on foreground.Split message/presence subscriptions — Active channel subscription split into kind 20000 (messages, stays open always) and kind 20001 (presence heartbeats, closed on background). Presence events are high-frequency but have no value when the app is not visible.
Subscription lifecycle cleanup — Presence sub properly cleaned up on channel switch,
currentActiveGeohashcleared when leaving a location channel,panicResetclears relay manager subscriptions to prevent ghost subs on reconnect.Checklist