Skip to content

Conversation

@SahilPatidar
Copy link

@SahilPatidar SahilPatidar commented Aug 4, 2025

This PR introduces a redesigned LibraryResolver based on Cling’s Dyld, so it can be integrated into ORC JIT.

Ubuntu

Test Name Run With (sec) Without (sec)
roottest-cling 1 272.14 266.27
2 257.97 255.4
3 269.12 266.79
roottest-root-hist 1 16.13 16.25
roottest-root-math 1 38.36 38.60
2 38.21 38.59
roottest-root-tree 1 824.17 835.11
2 822.04 831.43
3 825.95 829.34
roottest-root-treeformula 1 137.48 138.78
2 137.42 139.05
3 137.11 138.78
root-io-stdarray 1 91.71 92.49
2 92.38 92.57
3 91.91 92.80

MacOS:

Test Name Run With (sec) Without (sec)
roottest-cling 1 213.90 209.58
roottest-root-hist 1 19.29 20.15
roottest-root-math 1 33.28 33.06
roottest-root-tree 1 758.85 752.78
roottest-root-treeformula 1 112.09 115.50
root-io-stdarray 1 77.50 80.51

@SahilPatidar
Copy link
Author

@vgvassilev

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

Test Results

    22 files      22 suites   3d 17h 46m 1s ⏱️
 3 794 tests  3 794 ✅ 0 💤 0 ❌
80 330 runs  80 330 ✅ 0 💤 0 ❌

Results for commit c71cdd6.

♻️ This comment has been updated with latest results.

@SahilPatidar
Copy link
Author

Most of these tests are currently failing:

2144: roottest-root-io-evolution-issue-8083-WriteAfterOld  
2145: roottest-root-io-evolution-issue-8083-readfile  
2683: roottest-root-meta-MakeProject-examples  
2684: roottest-root-meta-MakeProject-stltest  
2685: roottest-root-meta-MakeProject-stltest2  
2714: roottest-root-meta-autoloading-ROOT-8432-execcmsWrapper-auto  
2729: roottest-root-meta-autoloading-headerParsingOnDemand-no_autoparse_write  
2730: roottest-root-meta-autoloading-headerParsingOnDemand-no_autoparse_read  
3074: roottest-root-tree-addresses-make  
3138: roottest-root-tree-selectorreader-make  

Some of them are failing with this error:

fatal error: 'libcmswrapper_dictrflx' file not found
R__LOAD_LIBRARY(libcmswrapper_dictrflx)

@vgvassilev
Copy link
Member

Most of these tests are currently failing:

2144: roottest-root-io-evolution-issue-8083-WriteAfterOld  
2145: roottest-root-io-evolution-issue-8083-readfile  
2683: roottest-root-meta-MakeProject-examples  
2684: roottest-root-meta-MakeProject-stltest  
2685: roottest-root-meta-MakeProject-stltest2  
2714: roottest-root-meta-autoloading-ROOT-8432-execcmsWrapper-auto  
2729: roottest-root-meta-autoloading-headerParsingOnDemand-no_autoparse_write  
2730: roottest-root-meta-autoloading-headerParsingOnDemand-no_autoparse_read  
3074: roottest-root-tree-addresses-make  
3138: roottest-root-tree-selectorreader-make  

Some of them are failing with this error:

fatal error: 'libcmswrapper_dictrflx' file not found
R__LOAD_LIBRARY(libcmswrapper_dictrflx)

How do we solve that?

@SahilPatidar
Copy link
Author

Do you have any idea what this macro is doing? I'm not familiar with it:

2523: In file included from input_line_25:1:
2523: /Users/sahilpatidar/Desktop/root/roottest/root/meta/autoloading/ROOT-8432/execcmsWrapper.C:1:1: fatal error: 'libcmswrapper_dictrflx' file not found
2523: R__LOAD_LIBRARY(libcmswrapper_dictrflx)
2523: ^
2523: /Users/sahilpatidar/Desktop/root/build/include/Rtypes.h:458:35: note: expanded from macro 'R__LOAD_LIBRARY'
2523: # define R__LOAD_LIBRARY(LIBRARY) _R_PragmaStr(cling load ( #LIBRARY ))
2523:                                   ^
2523: /Users/sahilpatidar/Desktop/root/build/include/Rtypes.h:457:26: note: expanded from macro '_R_PragmaStr'
2523: # define _R_PragmaStr(x) _Pragma(#x)
2523:                          ^
2523: <scratch space>:5:40: note: expanded from here
2523:  cling load ( "libcmswrapper_dictrflx" )

It looks like it's trying to load a library into Cling, but I'm not sure exactly where or how this is being triggered.

@vgvassilev
Copy link
Member

This is user code forcing a dlopen.

@SahilPatidar
Copy link
Author

I'm seeing this error in those tests even on my master branch.

@vgvassilev
Copy link
Member

I'm seeing this error in those tests even on my master branch.

You can check here what are the issues https://github.com/root-project/root/runs/47314709783

I see on windows: LLVM Error: Unsupported binary format: C:\ROOT-CI\build\bin\gdk-1.3.dll and then a bunch of:

Error in <AutoloadLibraryMU>: Failed to load library /github/home/ROOT-CI/build/test/stressShapescling::DynamicLibraryManager::loadLibrary(): /github/home/ROOT-CI/build/test/threads: cannot dynamically load position-independent executable

Can we resolve these first?

@SahilPatidar
Copy link
Author

LLVM Error: Unsupported binary format: C:\ROOT-CI\build\bin\gdk-1.3.dll
This error occurs because COFF format support for parsing dependencies hasn’t been added yet.
To fix it, we just need to add COFF support, or simply skip it with:

if (Obj->isCOFF())
    return; // TODO: Add COFF support

@vgvassilev
Copy link
Member

LLVM Error: Unsupported binary format: C:\ROOT-CI\build\bin\gdk-1.3.dll This error occurs because COFF format support for parsing dependencies hasn’t been added yet. To fix it, we just need to add COFF support, or simply skip it with:

if (Obj->isCOFF())
    return; // TODO: Add COFF support

Sure, if that does not work in the master that’s a reasonable fix.

@SahilPatidar
Copy link
Author

Do you think these test failures are related to our changes? I have some async workarounds locally, so we could try them there too.

@vgvassilev
Copy link
Member

Do you think these test failures are related to our changes? I have some async workarounds locally, so we could try them there too.

Yes, most of them seem like this. The ROOT CI is generally green.

@SahilPatidar
Copy link
Author

Ok, I first need to understand this error. Do you know where this macro expands?

In file included from input_line_23:1:
/github/home/ROOT-CI/src/roottest/root/tree/selectorreader/runClasses.C:2:1: fatal error: 'SampleClasses_h' file not found
R__LOAD_LIBRARY(SampleClasses_h)
^
/github/home/ROOT-CI/build/include/Rtypes.h:458:35: note: expanded from macro 'R__LOAD_LIBRARY'
# define R__LOAD_LIBRARY(LIBRARY) _R_PragmaStr(cling load ( #LIBRARY ))
                                  ^
/github/home/ROOT-CI/build/include/Rtypes.h:457:26: note: expanded from macro '_R_PragmaStr'
# define _R_PragmaStr(x) _Pragma(#x)
                         ^
<scratch space>:4:33: note: expanded from here
 cling load ( "SampleClasses_h" )

@vgvassilev
Copy link
Member

Ok, I first need to understand this error. Do you know where this macro expands?

In file included from input_line_23:1:
/github/home/ROOT-CI/src/roottest/root/tree/selectorreader/runClasses.C:2:1: fatal error: 'SampleClasses_h' file not found
R__LOAD_LIBRARY(SampleClasses_h)
^
/github/home/ROOT-CI/build/include/Rtypes.h:458:35: note: expanded from macro 'R__LOAD_LIBRARY'
# define R__LOAD_LIBRARY(LIBRARY) _R_PragmaStr(cling load ( #LIBRARY ))
                                  ^
/github/home/ROOT-CI/build/include/Rtypes.h:457:26: note: expanded from macro '_R_PragmaStr'
# define _R_PragmaStr(x) _Pragma(#x)
                         ^
<scratch space>:4:33: note: expanded from here
 cling load ( "SampleClasses_h" )

You can build the master and break at dlopen. The debugger will show you when and where it gets expanded.

@SahilPatidar
Copy link
Author

I tried to reproduce the CI failure for the test roottest-root-aclic-misc-assertmyfun on Ubuntu 25.4. In CI it shows:

Error loading object: Incompatible object file: /github/home/ROOT-CI/build/roottest/root/aclic/misc/addIncludePathGCCMajor_C.so

This happens when the code tries to read symbols from a library that isn’t loaded yet (similar to ROOT/Cling’s Dyld::containSymbol). Normally we check if the library is shared or valid before registering it, so this error shouldn’t appear.

The same type of error is also showing up on other platforms:

  • mac13 — test roottest-root-dataframe-test_reduce:
    Error loading object: No such file or directory
  • alma9 arm64 — test roottest-root-tree-cache-CacheRange:
    Error loading object: The file was not recognized as a valid object file

These errors cause test failures due to error diffs. The Error loading object: message should only appear if something goes wrong in the middle of working with the shared library. This kind of error might not appear in Cling’s dyld implementation, since dyld doesn’t produce such errors.

To test locally, I ran on my Mac M1 with the same Docker image and steps used in CI:

docker run --security-opt label=disable --platform linux/amd64 -it registry.cern.ch/root-ci/ubuntu2504:buildready

Following the GitHub Actions steps, the test passed locally, so I couldn’t reproduce the failure outside CI.

@vgvassilev
Copy link
Member

I tried to reproduce the CI failure for the test roottest-root-aclic-misc-assertmyfun on Ubuntu 25.4. In CI it shows:

Error loading object: Incompatible object file: /github/home/ROOT-CI/build/roottest/root/aclic/misc/addIncludePathGCCMajor_C.so

This happens when the code tries to read symbols from a library that isn’t loaded yet (similar to ROOT/Cling’s Dyld::containSymbol). Normally we check if the library is shared or valid before registering it, so this error shouldn’t appear.

The same type of error is also showing up on other platforms:

* **mac13** — test `roottest-root-dataframe-test_reduce`:
  `Error loading object: No such file or directory`

* **alma9 arm64** — test `roottest-root-tree-cache-CacheRange`:
  `Error loading object: The file was not recognized as a valid object file`

These errors cause test failures due to error diffs. The Error loading object: message should only appear if something goes wrong in the middle of working with the shared library. This kind of error might not appear in Cling’s dyld implementation, since dyld doesn’t produce such errors.

To test locally, I ran on my Mac M1 with the same Docker image and steps used in CI:

docker run --security-opt label=disable --platform linux/amd64 -it registry.cern.ch/root-ci/ubuntu2504:buildready

Following the GitHub Actions steps, the test passed locally, so I couldn’t reproduce the failure outside CI.

@dpiparo, we need help here with the reproduction logic I guess...

@aaronj0
Copy link
Contributor

aaronj0 commented Sep 18, 2025

Hi @SahilPatidar looks like we have a relatively stable build. I'd like to point out that the two failing tests on macbeta:

roottest-root-treeformula-stl-writemap
roottest-root-treeformula-stl-mapvector

are unrelated to this PR. The TMVA test failures only on alma platforms seems strange, can you rebase on master?

@SahilPatidar
Copy link
Author

Ok, I’ll do that. But I recently rebased it — if the alma tests are failing due to this PR, could you let me know what might be going wrong based on the test failure details?

@aaronj0
Copy link
Contributor

aaronj0 commented Sep 19, 2025

Ok, I’ll do that. But I recently rebased it — if the alma tests are failing due to this PR, could you let me know what might be going wrong based on the test failure details?

The 3 failing tests belong to ROOT's machine learning component - TMVA. Two of the TMVA tests (pyunittests-bindings-pyroot-pythonizations-pyroot-pyz-sofie-gnn, tutorial-machine_learning-TMVA_SOFIE_GNN-py) show failed calls to cuInit on alma10 :

07:45:15.852599: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)

And on both alma8 and alma10 we see a seg fault coming from libopenblas on tutorial-machine_learning-TMVA_SOFIE_GNN_Application which seems to be a remanifestation of some regex related issue...

Perhaps you can try reproducing these with the relevant docker images?

@ferdymercury
Copy link
Collaborator

if the alma tests are failing due to this PR, could you let me know what might be going wrong based on the test failure details?

I think they are failing in many other PR, too

@vgvassilev
Copy link
Member

If that's so, we should move forward...

@aaronj0
Copy link
Contributor

aaronj0 commented Sep 23, 2025

I think they are failing in many other PR, too

Yes, looks like none of the test failures here are related.

If that's so, we should move forward...

Yes, @SahilPatidar @vgvassilev I'd like to know how we proceed in this case. If the PR ready to land in LLVM, can we go ahead there? I assume we rework the DLM in Cling to use the new LibraryResolver with the next upgrade?

@vgvassilev
Copy link
Member

I’d wait until landing this in llvm. I think we are quite close now.

@SahilPatidar
Copy link
Author

SahilPatidar commented Sep 23, 2025

One more task I need to complete is moving LLVM PR code from Orc/TargetProcess to Orc/Shared, so that it can be used on the Root side as an API.

},
config);
return res;
// return m_Dyld->searchLibrariesForSymbol(mangledName, searchSystem);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to delete the rest of the content in this file?

@vgvassilev
Copy link
Member

@SahilPatidar, can we update the performance table similar to #6969. Also we should remove the redundant code from cling now and make this ready to review.

@SahilPatidar SahilPatidar changed the title [WIP][LLVM] Test: Validate Auto Shared Library Resolver Correctness. [LLVM][Cling] Add Automatic Shared Library Resolver. Jan 2, 2026
@SahilPatidar SahilPatidar marked this pull request as ready for review January 2, 2026 10:10
@vgvassilev
Copy link
Member

Upstream PR: llvm/llvm-project#169161

@vgvassilev vgvassilev requested a review from hahnjo January 2, 2026 15:32
Copy link
Member

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. @hahnjo probably has the instructions how to backport these to the llvm fork of ROOT.

Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SahilPatidar this PR is a bit light on motivation and details:

  • What is the intent behind upstreaming?
  • What is the benefit for us (ROOT) of the work?
  • What is the benefit to backport this now and introduce additional patches? (AFAICT there was nothing backported so far, it all lived in Cling)

Regarding the last point, note that we managed to allow building against vanilla LLVM since some time (years) now. That would be essentially broken with this change, which is a bit unfortunate and that's why I'm trying to understand the bigger context.

@vgvassilev
Copy link
Member

@SahilPatidar this PR is a bit light on motivation and details:

  • What is the intent behind upstreaming?
  • What is the benefit for us (ROOT) of the work?
  • What is the benefit to backport this now and introduce additional patches? (AFAICT there was nothing backported so far, it all lived in Cling)

Regarding the last point, note that we managed to allow building against vanilla LLVM since some time (years) now. That would be essentially broken with this change, which is a bit unfortunate and that's why I'm trying to understand the bigger context.

This code lifts a major component of cling upstream and backports it following its new design. If we don’t backport now, it needs to happen in the new upgrade but we risk all of that to rot and we won’t have anybody to update it later.

@hahnjo
Copy link
Member

hahnjo commented Jan 4, 2026

If we don’t backport now, it needs to happen in the new upgrade

This is not correct / precise: Since we currently don't use upstream facilities for this, it doesn't need to happen during the upgrade. The simplification can land in a second step once the upstream code is already there, as we do it for all other changes where we don't already have to carry downstream patches.

@vgvassilev
Copy link
Member

If we don’t backport now, it needs to happen in the new upgrade

This is not correct / precise: Since we currently don't use upstream facilities for this, it doesn't need to happen during the upgrade. The simplification can land in a second step once the upstream code is already there, as we do it for all other changes where we don't already have to carry downstream patches.

That’s what I am saying: at the risk of rotting and not having who to follow through with the current pr.

@SahilPatidar
Copy link
Author

SahilPatidar commented Jan 5, 2026

@hahnjo The main idea behind this API was not only cling, but also future ORC JIT use. While redesigning cling::Dyld, I tried to create an API that could be useful for ROOT as well, based on how cling currently uses this functionality, and at the same time replace the older implementation.
I didn’t have clear guidance on how ORC JIT wants to integrate such an API yet. ORC JIT is still evolving towards the new executor model, so some design decisions were made based on the current situation and understanding at the time. It’s possible that this design will need adjustments later.
For now, the API is tested in cling. At first The PR was created mainly to validate that this works correctly with ROOT and is aligned with its current usage.

@guitargeek
Copy link
Contributor

Very nice work!

note that we managed to allow building against vanilla LLVM since some time (years) now. That would be essentially broken with this change, which is a bit unfortunate and that's why I'm trying to understand the bigger context.

We definitely should not break building against vanilla LLVM at this point, as many packagers are relying on that. Most notably Conda, which is also a priority for us (@vepadulano, any thoughts?). Also for NixOS is would be a nuisance. Packagers will probably just revert this PR in some pre-configuration step once they figured out what's going on.

Also, there are probably other power users in the wild who build against external LLVM. Having to explain them why it stopped working and that it will soon work again would not be great for the technical credibility of ROOT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants