-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[LLVM][Cling] Add Automatic Shared Library Resolver. #19514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Test Results 22 files 22 suites 3d 17h 46m 1s ⏱️ Results for commit c71cdd6. ♻️ This comment has been updated with latest results. |
|
Most of these tests are currently failing: Some of them are failing with this error: |
How do we solve that? |
|
Do you have any idea what this macro is doing? I'm not familiar with it: It looks like it's trying to load a library into Cling, but I'm not sure exactly where or how this is being triggered. |
|
This is user code forcing a dlopen. |
|
I'm seeing this error in those tests even on my master branch. |
You can check here what are the issues https://github.com/root-project/root/runs/47314709783 I see on windows: Can we resolve these first? |
|
LLVM Error: if (Obj->isCOFF())
return; // TODO: Add COFF support |
Sure, if that does not work in the master that’s a reasonable fix. |
|
Do you think these test failures are related to our changes? I have some async workarounds locally, so we could try them there too. |
Yes, most of them seem like this. The ROOT CI is generally green. |
|
Ok, I first need to understand this error. Do you know where this macro expands? |
You can build the master and break at dlopen. The debugger will show you when and where it gets expanded. |
|
I tried to reproduce the CI failure for the test This happens when the code tries to read symbols from a library that isn’t loaded yet (similar to ROOT/Cling’s The same type of error is also showing up on other platforms:
These errors cause test failures due to error diffs. The To test locally, I ran on my Mac M1 with the same Docker image and steps used in CI: docker run --security-opt label=disable --platform linux/amd64 -it registry.cern.ch/root-ci/ubuntu2504:buildreadyFollowing the GitHub Actions steps, the test passed locally, so I couldn’t reproduce the failure outside CI. |
@dpiparo, we need help here with the reproduction logic I guess... |
|
Hi @SahilPatidar looks like we have a relatively stable build. I'd like to point out that the two failing tests on macbeta: are unrelated to this PR. The TMVA test failures only on alma platforms seems strange, can you rebase on master? |
|
Ok, I’ll do that. But I recently rebased it — if the alma tests are failing due to this PR, could you let me know what might be going wrong based on the test failure details? |
The 3 failing tests belong to ROOT's machine learning component - TMVA. Two of the TMVA tests ( And on both alma8 and alma10 we see a seg fault coming from Perhaps you can try reproducing these with the relevant docker images? |
I think they are failing in many other PR, too |
|
If that's so, we should move forward... |
Yes, looks like none of the test failures here are related.
Yes, @SahilPatidar @vgvassilev I'd like to know how we proceed in this case. If the PR ready to land in LLVM, can we go ahead there? I assume we rework the DLM in Cling to use the new LibraryResolver with the next upgrade? |
|
I’d wait until landing this in llvm. I think we are quite close now. |
|
One more task I need to complete is moving LLVM PR code from Orc/TargetProcess to Orc/Shared, so that it can be used on the Root side as an API. |
| }, | ||
| config); | ||
| return res; | ||
| // return m_Dyld->searchLibrariesForSymbol(mangledName, searchSystem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to delete the rest of the content in this file?
|
@SahilPatidar, can we update the performance table similar to #6969. Also we should remove the redundant code from cling now and make this ready to review. |
|
Upstream PR: llvm/llvm-project#169161 |
vgvassilev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me. @hahnjo probably has the instructions how to backport these to the llvm fork of ROOT.
hahnjo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SahilPatidar this PR is a bit light on motivation and details:
- What is the intent behind upstreaming?
- What is the benefit for us (ROOT) of the work?
- What is the benefit to backport this now and introduce additional patches? (AFAICT there was nothing backported so far, it all lived in Cling)
Regarding the last point, note that we managed to allow building against vanilla LLVM since some time (years) now. That would be essentially broken with this change, which is a bit unfortunate and that's why I'm trying to understand the bigger context.
This code lifts a major component of cling upstream and backports it following its new design. If we don’t backport now, it needs to happen in the new upgrade but we risk all of that to rot and we won’t have anybody to update it later. |
This is not correct / precise: Since we currently don't use upstream facilities for this, it doesn't need to happen during the upgrade. The simplification can land in a second step once the upstream code is already there, as we do it for all other changes where we don't already have to carry downstream patches. |
That’s what I am saying: at the risk of rotting and not having who to follow through with the current pr. |
|
@hahnjo The main idea behind this API was not only cling, but also future ORC JIT use. While redesigning |
|
Very nice work!
We definitely should not break building against vanilla LLVM at this point, as many packagers are relying on that. Most notably Conda, which is also a priority for us (@vepadulano, any thoughts?). Also for NixOS is would be a nuisance. Packagers will probably just revert this PR in some pre-configuration step once they figured out what's going on. Also, there are probably other power users in the wild who build against external LLVM. Having to explain them why it stopped working and that it will soon work again would not be great for the technical credibility of ROOT. |
This PR introduces a redesigned
LibraryResolverbased on Cling’sDyld, so it can be integrated into ORC JIT.Ubuntu
MacOS: