-
Notifications
You must be signed in to change notification settings - Fork 741
fix: potential deadlock on thread pool shutdown #12050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
In case it helps: I've been adding tracing to try to investigate the cause of this in #12013. The original thought was that maybe there was a race condition with Note: this was necessary because using a debug build apparently fixed the problem! So far, I've had it trace in that block before waiting (waiting during shutdown), and ten minutes after waiting (timeout after shutdown). The tracing in the critical loop in Here are examples of what has been printed in some hanging runs (the end of every code block is the last thing printed), obtained by running In most of the successful linting cases, there are no workers waiting during shutdown. But here is an example of a case where a worker is waiting during shutdown, but we do not hang. Anyway, just wanted to share this information publicly in case it's helpful. We're going to run the |
|
Mathlib CI status (docs):
|
|
Reference manual CI status:
|
| m_queue_cv.notify_all(); | ||
| // we can assume that `m_std_workers` will not be changed after this line | ||
| } | ||
| m_queue_cv.notify_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if I'm not wrong, I think doesn't change much as it stands, as m_shutting_down is always accessed by the workers with the lock acquired, so 'visibility' wouldn't change with this reordering.
I think this #12052 is more likely to be the problem: you can see from @thorimur's comment in this PR that we caught this happening with the workers at the limit:
No description provided.