fix(jwt): race conditions and IAM reliability issues with identity verification enabled by nan-li · Pull Request #2609 · OneSignal/OneSignal-Android-SDK

nan-li · 2026-04-09T19:51:27Z

Description

One Line Summary

Fix multiple issues with identity verification (IV) enabled: startup race conditions, missing JWT invalidated callback, stuck IAM fetch after login, and IAM retry on expired JWT.

Details

Motivation

With identity verification enabled, several issues surfaced during testing:

Anonymous operations were sometimes not purged on cold start due to timing
The onUserJwtInvalidated listener may not be called with JWT was missing at SDK init time
In-App Messages may not be fetched after the first login
If a JWT expired mid-session, the IAM fetch could silently failed with no recovery

Scope

All changes are gated behind identity verification being enabled (useIdentityVerification == true). Normal (non-IV) SDK behavior is unaffected.

Commit 1 -- Fix race condition: purge anonymous ops AFTER queue is loaded

IdentityVerificationService.onModelReplaced (config HYDRATE) could fire before OperationRepo.loadSavedOperations() finished, causing removeOperationsWithoutExternalId() to run against an empty queue
Fixed by wrapping the HYDRATE handler in suspendifyOnIO + awaitInitialized(), following the same pattern as RecoverFromDroppedLoginBug

Commit 2 -- Replay JWT invalidated event to late-registered listeners

fireJwtInvalidated now buffers the externalId when no listeners are subscribed (e.g. during SDK init HYDRATE) and replays it when the first IUserJwtInvalidatedListener is added

Commit 3 -- Fix IAM fetch stuck after login with IV enabled

LoginUserOperationExecutor.createUser() was not passing the RYW token from the backend response to the ConsistencyManager, causing IamFetchReadyCondition to not resolve
Added rywData field to CreateUserResponse, parsed ryw_token/ryw_delay in JSONConverter, and set RYW data in ConsistencyManager after successful createUser

Commit 4 -- Retry IAM fetch after JWT refresh on 401/403 response

InAppBackendService now throws BackendException on unauthorized (401/403) responses instead of returning null
InAppMessagesManager catches the exception, stores a pending retry state, and retries the fetch when JwtTokenStore notifies that the JWT has been refreshed for the same user
Pending retry is cleared on user switch to avoid stale retries

Commit 5 -- Fix PR review issues: fallback 401 handling, volatile fields, TOCTOU race

fetchInAppMessagesWithoutRywToken (the fallback path after RYW retries are exhausted) was missing the 401/403 check, silently returning null and bypassing the JWT retry mechanism
Added @Volatile to pendingJwtRetryExternalId and pendingJwtRetryRywData for cross-thread memory visibility
Fixed a TOCTOU race in UserManager between fireJwtInvalidated and addJwtInvalidatedListener: without synchronization, a listener could subscribe between the hasSubscribers check and the pendingJwtInvalidatedExternalId store, causing the event to be lost permanently

Testing

Unit testing

Updated LoginUserOperationExecutorTests to include the new IConsistencyManager dependency (all 16 constructor calls)
Added test in InAppBackendServiceTests: 401 response throws BackendException
Added test in InAppBackendServiceTests: 401 from fallback (no RYW token) path throws BackendException
Added 5 tests in InAppMessagesManagerTests covering the JWT 401 retry flow:
- 401 stores pending retry state
- JWT refresh for matching user triggers retry
- JWT refresh for different user does not retry
- No-op when no pending retry exists
- User switch clears pending retry state

Manual testing

Tested with IV enabled on a fresh install: verified no IAM fetch for anonymous user, IAM fetched after login
Tested JWT expiry mid-session: verified IAM fetch retries after updateUserJwt is called
Tested login/logout cycle: verified pending retry is cleared on user switch

Affected code checklist

In-App Messaging
REST API requests
Public API changes

Checklist

Overview

I have filled out all REQUIRED sections above
PR does one thing (fix IV-related issues for 5.8 release)
Any Public API changes are explained in the PR details and conform to existing APIs

Testing

I have included test coverage for these changes, or explained why they are not needed
All automated tests pass, or I explained why that is not possible
I have personally tested this on my device, or explained why that is not possible

Final pass

Code is as readable as possible
I have reviewed this PR myself, ensuring it meets each checklist item

IdentityVerificationService.onModelReplaced (config HYDRATE) could fire before OperationRepo.loadSavedOperations() finished, causing removeOperationsWithoutExternalId() to run against an empty queue. Wrap the HYDRATE handler in suspendifyOnIO + awaitInitialized() so the purge waits for the queue to be fully populated, following the same pattern as RecoverFromDroppedLoginBug. Made-with: Cursor

fireJwtInvalidated now buffers the externalId when no listeners are subscribed (e.g. during SDK init HYDRATE) and replays it when the first IUserJwtInvalidatedListener is added. Preserves async dispatch and runCatching for runtime 401 path. Made-with: Cursor

LoginUserOperationExecutor.createUser() was not passing the RYW token from the backend response to the ConsistencyManager, causing the IamFetchReadyCondition to never resolve after login. This meant InAppMessagesManager coroutines awaited forever and IAMs were never fetched for the logged-in user. - Add rywData field to CreateUserResponse - Parse ryw_token/ryw_delay in JSONConverter.convertToCreateUserResponse - Set RYW data in ConsistencyManager after successful createUser Made-with: Cursor

When an IAM fetch returns an unauthorized response (401 or 403), the SDK now saves the pending fetch state and automatically retries once the JWT is refreshed for the same user. Switching users clears any stale retry. Made-with: Cursor

...sages/src/main/java/com/onesignal/inAppMessages/internal/backend/impl/InAppBackendService.kt

...ain/java/com/onesignal/user/internal/operations/impl/executors/LoginUserOperationExecutor.kt

...l/in-app-messages/src/main/java/com/onesignal/inAppMessages/internal/InAppMessagesManager.kt

OneSignalSDK/onesignal/core/src/main/java/com/onesignal/user/internal/UserManager.kt

nan-li · 2026-04-10T03:35:16Z

@claude re-review

nan-li · 2026-04-10T04:21:43Z

@claude re-review

nan-li · 2026-04-10T16:16:16Z

@claude[agent] review

Claude · 2026-04-10T16:16:22Z

@nan-li The model is not available for your account. This can happen if the model was disabled by your organization's policy or if your Copilot plan doesn't include access to it.

You can try again without specifying a model (just @copilot) to use the default, or choose a different model from the model picker.