Skip to content

Fix issues with resuming async tasks awaiting a future (backport #1469)#1560

Merged
ahcorde merged 6 commits intoros2:jazzyfrom
bjsowa:backport/jazzy/fix-async
Jan 15, 2026
Merged

Fix issues with resuming async tasks awaiting a future (backport #1469)#1560
ahcorde merged 6 commits intoros2:jazzyfrom
bjsowa:backport/jazzy/fix-async

Conversation

@bjsowa
Copy link
Copy Markdown
Contributor

@bjsowa bjsowa commented Dec 8, 2025

Backport of #1469 to Jazzy

@bjsowa
Copy link
Copy Markdown
Contributor Author

bjsowa commented Dec 8, 2025

@nadavelkabets I modified the deque logic to preserve the LIFO order which is maintained for backward compatibility in Jazzy. Could you check it?

for _ in range(ready_tasks_count):
# We pop from the right to maintain LIFO order for backward compatibility
# From Kilted onwards, the order of task execution is FIFO
task = self._ready_tasks.pop()
Copy link
Copy Markdown
Contributor

@nadavelkabets nadavelkabets Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's more complicated to mimic the old behavior precisely.
While iterating over ready tasks, new tasks may get added to the ready tasks list by either a user that created a new task or a finished task that triggered done callbacks.
In the existing implementation, these would only execute in the next _wait_for_ready_callbacks iteration (they iterate over a copy of the tasks list) while in our implementation they would execute right away (we iterate over the same list).
In addition, prior to our changes blocked tasks stayed in the tasks list while we remove and append them back. This also means that we change the execution order of resumed tasks.

The only perfect solution I can imagine is to add a creation_time timestamp to the TaskData class, and using a heapq for the ready_tasks list to keep it sorted from oldest task to newest task. Furthermore, we currently utilize done callbacks to execute resumed tasks. To maintain the same order, we would have to give the done callback the same timestamp as the original task.
I honestly find the idea of switching Jazzy to FIFO order more realistic and straightforward.
Is that unreasonable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If changing the order to FIFO is an option, I'm all for it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't attend the working group meeting this week but it appears like they're fine with the switch to FIFO ordering.

Fix issues with resuming async tasks awaiting a future (backport #1469)
#1560
Try a full CI, to check if we are breaking something by modifying this internal behavior.
In other repos we went ahead with similar changes since people shouldn’t rely on this internal ordering of execution for wait-set based executors.

@kscottz
Copy link
Copy Markdown
Contributor

kscottz commented Dec 18, 2025

We discussed this in our weekly project meeting. We've added your issue to the next Client Library Working Group Meeting on January 9th for discussion and you have been added to the agenda.

bjsowa and others added 3 commits January 11, 2026 17:09
…#1469)

Signed-off-by: Błażej Sowa <[email protected]>
Signed-off-by: Nadav Elkabets <[email protected]>
Co-authored-by: Nadav Elkabets <[email protected]>
…ard compatibility"

This reverts commit c388647.

Signed-off-by: Błażej Sowa <[email protected]>
@bjsowa bjsowa force-pushed the backport/jazzy/fix-async branch from 215ca8d to 8ea7805 Compare January 11, 2026 16:25
Signed-off-by: Błażej Sowa <[email protected]>
@bjsowa bjsowa force-pushed the backport/jazzy/fix-async branch from 8ea7805 to d0780c9 Compare January 11, 2026 16:25
@bjsowa
Copy link
Copy Markdown
Contributor Author

bjsowa commented Jan 11, 2026

I reverted the changes related to maintaining LIFO order and modified the test that relied on ordering. I think it's ready to run the full CI @mjcarroll @ahcorde @nadavelkabets

@bjsowa bjsowa requested a review from nadavelkabets January 11, 2026 16:29
Copy link
Copy Markdown
Contributor

@nadavelkabets nadavelkabets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the 2 minor suggestions, LGTM.
Are tests failing for you locally? The failing tests do not seem related...


coroutine_future = executor.create_task(coroutine)

start_time = time.monotonic()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly time.monotonic caused this test to fail in windows environment and this was fixed in rolling by switching to perf_counter, take a look at #1564

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I backported the change

Co-authored-by: Nadav Elkabets <[email protected]>
Signed-off-by: Błażej Sowa <[email protected]>
@bjsowa
Copy link
Copy Markdown
Contributor Author

bjsowa commented Jan 11, 2026

Are tests failing for you locally? The failing tests do not seem related...

Nope, the tests are passing for me locally.

@bjsowa bjsowa force-pushed the backport/jazzy/fix-async branch from efc9f00 to 8e9fb18 Compare January 11, 2026 23:34
@bjsowa bjsowa requested a review from nadavelkabets January 11, 2026 23:48
@JWhitleyWork
Copy link
Copy Markdown

Looks like this also needs a backport to humble based on this comment.

Thanks again for working on this!

@nadavelkabets
Copy link
Copy Markdown
Contributor

@fujitatomoya @skyegalaxy @ahcorde @mjcarroll
Please run full CI on this

@ahcorde
Copy link
Copy Markdown
Contributor

ahcorde commented Jan 13, 2026

Pulls: #1560
Gist: https://gist.githubusercontent.com/ahcorde/5de7a805edbbfe64f19e112a28b1c866/raw/92ec171f8acddb3c7c698fc860099257102cc621/ros2.repos
BUILD args: --packages-above-and-dependencies rclpy
TEST args: --packages-above rclpy
ROS Distro: jazzy
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/17928

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@bjsowa
Copy link
Copy Markdown
Contributor Author

bjsowa commented Jan 13, 2026

@ahcorde Linux and Linux-rhel CI did not start properly due to some internal buildfarm error

@ahcorde ahcorde merged commit 197e513 into ros2:jazzy Jan 15, 2026
3 checks passed
@JWhitleyWork
Copy link
Copy Markdown

Would one of you mind creating the backport for this to humble?

@bjsowa
Copy link
Copy Markdown
Contributor Author

bjsowa commented Jan 16, 2026

Would one of you mind creating the backport for this to humble?

I started backporting it to humble but had some issues with some tests randomly failing. I'll try to create a PR next week.

@JWhitleyWork
Copy link
Copy Markdown

@bjsowa Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants