Fix issues with resuming async tasks awaiting a future (backport #1469)#1560
Fix issues with resuming async tasks awaiting a future (backport #1469)#1560ahcorde merged 6 commits intoros2:jazzyfrom
Conversation
|
@nadavelkabets I modified the deque logic to preserve the LIFO order which is maintained for backward compatibility in Jazzy. Could you check it? |
rclpy/rclpy/executors.py
Outdated
| for _ in range(ready_tasks_count): | ||
| # We pop from the right to maintain LIFO order for backward compatibility | ||
| # From Kilted onwards, the order of task execution is FIFO | ||
| task = self._ready_tasks.pop() |
There was a problem hiding this comment.
I believe it's more complicated to mimic the old behavior precisely.
While iterating over ready tasks, new tasks may get added to the ready tasks list by either a user that created a new task or a finished task that triggered done callbacks.
In the existing implementation, these would only execute in the next _wait_for_ready_callbacks iteration (they iterate over a copy of the tasks list) while in our implementation they would execute right away (we iterate over the same list).
In addition, prior to our changes blocked tasks stayed in the tasks list while we remove and append them back. This also means that we change the execution order of resumed tasks.
The only perfect solution I can imagine is to add a creation_time timestamp to the TaskData class, and using a heapq for the ready_tasks list to keep it sorted from oldest task to newest task. Furthermore, we currently utilize done callbacks to execute resumed tasks. To maintain the same order, we would have to give the done callback the same timestamp as the original task.
I honestly find the idea of switching Jazzy to FIFO order more realistic and straightforward.
Is that unreasonable?
There was a problem hiding this comment.
If changing the order to FIFO is an option, I'm all for it.
There was a problem hiding this comment.
I couldn't attend the working group meeting this week but it appears like they're fine with the switch to FIFO ordering.
Fix issues with resuming async tasks awaiting a future (backport #1469)
#1560
Try a full CI, to check if we are breaking something by modifying this internal behavior.
In other repos we went ahead with similar changes since people shouldn’t rely on this internal ordering of execution for wait-set based executors.
|
We discussed this in our weekly project meeting. We've added your issue to the next Client Library Working Group Meeting on January 9th for discussion and you have been added to the agenda. |
…#1469) Signed-off-by: Błażej Sowa <[email protected]> Signed-off-by: Nadav Elkabets <[email protected]> Co-authored-by: Nadav Elkabets <[email protected]>
…atibility Signed-off-by: Błażej Sowa <[email protected]>
…ard compatibility" This reverts commit c388647. Signed-off-by: Błażej Sowa <[email protected]>
215ca8d to
8ea7805
Compare
Signed-off-by: Błażej Sowa <[email protected]>
8ea7805 to
d0780c9
Compare
|
I reverted the changes related to maintaining LIFO order and modified the test that relied on ordering. I think it's ready to run the full CI @mjcarroll @ahcorde @nadavelkabets |
nadavelkabets
left a comment
There was a problem hiding this comment.
Apart from the 2 minor suggestions, LGTM.
Are tests failing for you locally? The failing tests do not seem related...
rclpy/test/test_executor.py
Outdated
|
|
||
| coroutine_future = executor.create_task(coroutine) | ||
|
|
||
| start_time = time.monotonic() |
There was a problem hiding this comment.
If I remember correctly time.monotonic caused this test to fail in windows environment and this was fixed in rolling by switching to perf_counter, take a look at #1564
There was a problem hiding this comment.
I backported the change
Co-authored-by: Nadav Elkabets <[email protected]> Signed-off-by: Błażej Sowa <[email protected]>
Nope, the tests are passing for me locally. |
Signed-off-by: Błażej Sowa <[email protected]>
efc9f00 to
8e9fb18
Compare
|
Looks like this also needs a backport to Thanks again for working on this! |
|
@fujitatomoya @skyegalaxy @ahcorde @mjcarroll |
|
Pulls: #1560 |
|
@ahcorde Linux and Linux-rhel CI did not start properly due to some internal buildfarm error |
|
Would one of you mind creating the backport for this to |
I started backporting it to humble but had some issues with some tests randomly failing. I'll try to create a PR next week. |
|
@bjsowa Thanks for the update! |
Backport of #1469 to Jazzy