Skip to content

Disentangle kernel-manager include tree#3544

Open
JanVogelsang wants to merge 146 commits intonest:mainfrom
JanVogelsang:master
Open

Disentangle kernel-manager include tree#3544
JanVogelsang wants to merge 146 commits intonest:mainfrom
JanVogelsang:master

Conversation

@JanVogelsang
Copy link
Copy Markdown
Contributor

Summary of changes

This PR fixes the horrible kernel-manager singleton design and cleans up all circular include chains which were just hidden behind the kernel-manager and plenty "*_impl.h" files.
Even though this PR touches 250+ files, the only functional changes were done in the kernel_manager.h/.cpp and the other changes are just follow-up refactorings enabled by the changes. I also got rid of most of the impl-files, which are not required anymore and just complicated things.

Why this was necessary

Anyone who ever worked on the kernel will immediately know about the pain the kernel-manager caused. As the kernel-manager included ALL other manager in the kernel_manager.h file, any other class using the kernel-manager therefore also included ALL other managers. This resulted in either of two things:

  1. Any class using the kernel manager could not be used by ANY other manager anymore, as this would lead to a circular include chain.
  2. Any class used by other managers was not allowed to include the kernel-manager in its .h file. This meant:
    2.1. The kernel-manager could only be included in the .cpp file, which meant that the function using the kernel-manager (and thus any of the other managers) could not be inlined, which in some cases is detrimental for performance.
    2.2. We had to create those impl.h files and use the kernel-manager in there. However, impl.h files and correctly including them is a nightmare and leads to various linker issues which are extremely difficult to debug.

The only time when one should use impl.h files is for some template specializations.

@JanVogelsang JanVogelsang self-assigned this Aug 9, 2025
@JanVogelsang JanVogelsang added S: Normal Handle this with default priority T: Maintenance Work to keep up the quality of the code and documentation. labels Aug 9, 2025
@JanVogelsang
Copy link
Copy Markdown
Contributor Author

Important to note: I kept the internal API identical. All I've done essentially was to use references to the managers instead which allowed to forward-declare the managers in the kernel_manager.h file. The reason there are that many changes is that including the kernel_manager.h no longer means that one also includes ALL other managers. So each file using a specific manager has to include that manager explicitly. This is much cleaner and will also speed up compilation a bit.

@github-project-automation github-project-automation bot moved this to In progress in Kernel Aug 9, 2025
@JanVogelsang
Copy link
Copy Markdown
Contributor Author

I have no idea what's supposed to be wrong with the 2 copyright headers. @terhorstd Could you take a look please?

@heplesser heplesser added the I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) label Aug 11, 2025
@heplesser
Copy link
Copy Markdown
Contributor

@JanVogelsang This looks very interesting! Just to make sure I understand your solution correctly: Instead of having the individual managers as actual data members of the KernelManager instance, the KernelManager now holds references to the different managers, so that kernel_manager.h only needs forward declarations of the various *Manager classes, right? That then allows removing all the inclusions and the impl-files.

This is certainly a great code cleanup. I am just a tiny bit concerned about performance, since we now have essentially pointers to the individual managers. Have you run any benchmarks yet?

@heplesser
Copy link
Copy Markdown
Contributor

I have no idea what's supposed to be wrong with the 2 copyright headers. @terhorstd Could you take a look please?

Very strange. Have you tried to run the copyright check locally on your machine?

@JanVogelsang
Copy link
Copy Markdown
Contributor Author

JanVogelsang commented Aug 12, 2025

@heplesser Yes, this is correct!

I benchmarked the performance and the PR's solution is indeed around 5% slower (locally and on Juwels). I have to admit that this surprises me a lot, as while this introduces another pointer, I would have assumed that all these pointers (or mostly just the simulation_manager and event_delivery_manager pointers during simulation) would be cached anyway. This seems to not be possible and I am not sure why. Then again, the cause of the performance degradation might also be something entirely different, for example growing code size which harms instruction cache efficiency due to more function inlining.

I implemented two other solutions now, which both require the internal API to change, though:
kernel().simulation_manager ->kernel::manager<SimulationManager>()

  1. A template function with a function-local static manager instance.
namespace kernel {
template<typename ManagerT, std::enable_if_t<std::is_base_of_v<ManagerInterface, ManagerT>>* = nullptr>
ManagerT& manager()
{
  static ManagerT static_manager;
  return static_manager;
}
}
  1. An inline (global) variable for each manager.
namespace kernel {
template <class T>
inline T g_manager_instance;

template <class T>
T& manager() noexcept {
  return g_manager_instance<T>;
}
}

I'm currently benchmarking both solutions, but local benchmarks suggest that the second approach performs best.

@JanVogelsang
Copy link
Copy Markdown
Contributor Author

I have no idea what's supposed to be wrong with the 2 copyright headers. @terhorstd Could you take a look please?

Very strange. Have you tried to run the copyright check locally on your machine?

I managed to solve this, it was just two empty files which I deleted but somehow they were added back in again, just empty.

@heplesser
Copy link
Copy Markdown
Contributor

@JanVogelsang Thanks for checking and the new solutions! I slightly prefer the second option, but that is mainly based on the fact that I still haven't entirely wrapped my mind around the std::enable_if_t construct.

Is it necessary to place this into a separate namespace kernel? Why not have it in the nest namespace and then just use it as manager<SimulationManager>() or maybe even get<SimulationManager>() or (since we have get in many places) use<SimulationManager>() or something like that?

@terhorstd terhorstd changed the title Made the kernel-manager great again Disentangle kernel-manager include tree Aug 13, 2025
Comment thread nestkernel/layer_impl.h
*/
#include "grid_layer.h"
#include "layer.h"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did these lines move the wrong way around? layer.h was deleted, but the _impl file now has all code?
What is the reason for dropping the include-guards?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, it definitely wasn't intentional. I think I first removed the layer_impl.h, but then added it back again as there was a circular dependency with the grid_layer.h. So I must have accidentally copied too much from the layer.h over to the layer_impl.h

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terhorstd Then again, it might actually make more sense to move all template function definitions to the impl file (in general, not only here), as the header files should ideally contain declarations only. Finn is currently doing that for all other files as well (at least header-to-cpp, not header-to-impl yet).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: This is fixed now, the .cpp file contains all non-template function definitions and the .impl file contains the template function definitions.

Comment thread models/CMakeLists.txt Outdated
Comment on lines +375 to +377
#include <algorithm>
#include <iostream>
#include <vector>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessica-mitchell, this would be a perfect place to link into the Doxygen documentation instead of copy-pasting some random code file into the RST.
For now however some reference syntax could shorten this somewhat unrelated change in this PR. Isn't there a possibility to link to a code file to be included? Instead of this PR updating the docs with the new file, it would be preferable to have something like:

.. code: cpp
   file: ../nestkernel/stopwatch.h

Would avoid this excessive change in an rst file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with @terhorstd, so not to prevent this PR from moving forward, we will discuss this documentation issue in issue #3607 and make a decision in a separate PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@terhorstd
Copy link
Copy Markdown
Contributor

terhorstd commented Aug 13, 2025

@JanVogelsang, this is big work! Thanks a lot for this big cleaning effort!
Currently there still seem to be some build errors, but most of the changes seem fine to me. I commented a few places already, and will do a proper review once the CI passes.

Could you also check the relation to #3465 ?

@JanVogelsang
Copy link
Copy Markdown
Contributor Author

@JanVogelsang Thanks for checking and the new solutions! I slightly prefer the second option, but that is mainly based on the fact that I still haven't entirely wrapped my mind around the std::enable_if_t construct.

Is it necessary to place this into a separate namespace kernel? Why not have it in the nest namespace and then just use it as manager<SimulationManager>() or maybe even get<SimulationManager>() or (since we have get in many places) use<SimulationManager>() or something like that?

That's totally fine as well. The additional/nested namespace (nest::kernel::) was just intended to make the transition for other devs easier, as kernel().X_manager isn't that far away from kernel::manager<XManager>(). But of course, any other flavor of that is great as well. I think I like the use<>() flavor most, but that's just personal taste.

@JanVogelsang
Copy link
Copy Markdown
Contributor Author

@heplesser There is some issue when destructing the managers. Compared to main, the order in which the destructors are being called changed. Before, the managers were destructed together with the KernelManager itself, as they were just members of the KernelManager. There, the order is strictly defined, it is always the reverse order in which they are declared. However, for the global inline manager instances, they are destructed in the reverse order in which they are constructed, but the order in which they are constructed is not transparent and depends on the compiler.

For some reason, the order in which the managers are destructed seems to matter and incorrect order leads to segmentation faults due to double-frees. I couldn't figure out why, but noticed that removing the ModelManager destructor fixes the issue. I am not sure what implications this has though. I would argue that the destructor is not required for the ModelManager, but please take a look at that yourself.

@JanVogelsang
Copy link
Copy Markdown
Contributor Author

The remaining issues might actually be solved by #3748 as those are related to how integer values are passed between Python and C++. So let's get that merged first.

@heplesser
Copy link
Copy Markdown
Contributor

@JanVogelsang I am happy to report that your branch nicely builds on macOS 26.4.1 with Apple Clang 21 and that all tests pass.

The new conflicts are due to #3721 being merged with the flush events for eprop. That should not be difficult to disentangle (@akorgor Could you try to take a look at those and create a PR sorting them out towards @JanVogelsang's repo? Note that this PR is from his master branch.)

@heplesser
Copy link
Copy Markdown
Contributor

@JanVogelsang While all works, I do get from Apple Clang the warning

/Users/plesser/NEST/code/src/jv_3544/nestkernel/secondary_event.h:185:12: warning: instantiation of variable
      'nest::DataSecondaryEvent<double, nest::DelayedRateConnectionEvent>::supported_syn_ids_' required here, but no definition is available
      [-Wundefined-var-template]
  185 |     return supported_syn_ids_;
      |            ^
/Users/plesser/NEST/code/src/jv_3544/nestkernel/secondary_event.h:315:3: note: in instantiation of member function
      'nest::DataSecondaryEvent<double, nest::DelayedRateConnectionEvent>::get_supported_syn_ids' requested here
  315 |   DelayedRateConnectionEvent()
      |   ^
/Users/plesser/NEST/code/src/jv_3544/nestkernel/secondary_event.h:147:31: note: forward declaration of template entity is
      here
  147 |   static std::set< synindex > supported_syn_ids_;
      |                               ^

and corresponding ones for coeff_length_ in the same class. Any ideas?

I did a default CMake run. How do I turn on LTO?

@akorgor
Copy link
Copy Markdown
Contributor

akorgor commented Apr 14, 2026

The new conflicts are due to #3721 being merged with the flush events for eprop. That should not be difficult to disentangle (@akorgor Could you try to take a look at those and create a PR sorting them out towards @JanVogelsang's repo? Note that this PR is from his master branch.)

Done in JanVogelsang#14. Note that after merging these changes there is a todo left for @JanVogelsang to

  • refactor the new files flush_event_mechanism.{cpp,h} in the spirit of this PR.

@heplesser
Copy link
Copy Markdown
Contributor

@JanVogelsang I now tried to build with LTO on macOS using -Dwith-lto=ON. Cmake ran through, reported "LTO Yes" in the summary, but then at the very end of the summary reported an error and did not generate targets. I include the CMake Error message below, which seems to point to some components missing in the Apple Clang compiler. That would indicate, I think, a need to make sure that things will remain working without LTO in the long run (but I will se how to build with LTO on macOS). The other point is that the last part of the error message and the fact that the summary reported "LTO Yes" seem to point to a problem in our CMake code processing with-lto: Shouldn't CMake have stopped at once when it found that the requested LTO is not available?

CMake Error at cmake/ProcessOptions.cmake:705 (message):
  Clang with LTO requires llvm-ar
Call Stack (most recent call first):
  CMakeLists.txt:170 (nest_process_lto)


CMake Error at cmake/ProcessOptions.cmake:711 (message):
  Clang with LTO requires llvm-ranlib
Call Stack (most recent call first):
  CMakeLists.txt:170 (nest_process_lto)


CMake Warning (dev) at cmake/ProcessOptions.cmake:713 (elseif):
  ELSEIF called with no arguments, it will be skipped.
Call Stack (most recent call first):
  CMakeLists.txt:170 (nest_process_lto)
This warning is for project developers.  Use -Wno-dev to suppress it.

@gtrensch gtrensch removed this from the NEST 3.10 milestone Apr 17, 2026
@JanVogelsang
Copy link
Copy Markdown
Contributor Author

It turned out that I used CMake's SEND_ERROR message type, which outputs an error, but doesn't stop the process. I changed it to FATAL_ERROR now, which makes CMake stop processing and should thus solve this issue.
And I am not sure how to get LTO running on Mac. I just struggled a lot to get it running on Ubuntu as well and after plenty of attempts the current setup turned out to be the only way to make it work (i.e., force the compiler to use llvm-ar and llvm-ranlib). We might have to adapt that for the Apple compiler.

@heplesser
Copy link
Copy Markdown
Contributor

... on Ubuntu ... the current setup turned out to be the only way to make it work (i.e., force the compiler to use llvm-ar and llvm-ranlib). ...

So do you use llvm-ar/ranlib to link under Ubuntu even when compiling with GCC?

@JanVogelsang
Copy link
Copy Markdown
Contributor Author

No, for GCC the default ar and ranlib are used (gcc-ar, gcc-ranlib), this is handled in cmake/ProcessOptions.cmake.

Fix warnings about undefined template member when building with Clang
@heplesser
Copy link
Copy Markdown
Contributor

@JanVogelsang I sent you a new PR with much simplified ProcessOptions code which now works (with limits) on macOS (with Clang) and under Linux (with GCC; Clang not tested).

Under macOS, gcc-15 from Homebrew does not work on my system (macOS 26.4.1, Apple Silicon). This seems to be due to a bug in the newest Apple SDK (https://github.com/orgs/Homebrew/discussions/6778#discussioncomment-16557516).

Under Linux with gcc-13, I get a curious error from spin_detector.h:

/users/plesser/nest/src/jv_3544/models/spin_detector.cpp: In member function 'virtual void nest::spin_detector::update(const nest::Time&, long int, long int':
/users/plesser/nest/src/jv_3544/models/spin_detector.cpp:74:12: error: 'last_event_' was not declared in this scope
   74 |     write( last_event_, RecordingBackend::NO_DOUBLE_VALUES, { static_cast< int >( last_event_.get_weight() ) } );
      |            ^~~~~~~~~~~
/users/plesser/nest/src/jv_3544/models/spin_detector.cpp: In member function 'virtual void nest::spin_detector::handle(nest::SpikeEvent&)':
/users/plesser/nest/src/jv_3544/models/spin_detector.cpp:143:7: error: 'last_event_' was not declared in this scope
  143 |       last_event_.set_weight( 1.0 );
      |       ^~~~~~~~~~~
/users/plesser/nest/src/jv_3544/models/spin_detector.cpp:148:14: error: 'last_event_' was not declared in this scope
  148 |       write( last_event_, RecordingBackend::NO_DOUBLE_VALUES, { static_cast< int >( last_event_.get_weight() ) } );
      |              ^~~~~~~~~~~
/users/plesser/nest/src/jv_3544/models/spin_detector.cpp:160:9: error: 'last_event_' was not declared in this scope
  160 |         last_event_ = e;
      |         ^~~~~~~~~~~

I find this curious because spin_detector.h includes event.h, which provides the full declaration of SpikeEvent.

@JanVogelsang
Copy link
Copy Markdown
Contributor Author

@heplesser The PR unfortunately doesn't work for Clang 18 on Ubuntu. But strangely, I also don't get the spin_detector error with GCC 13.

@JanVogelsang
Copy link
Copy Markdown
Contributor Author

JanVogelsang commented Apr 18, 2026

Interestingly, the spin_detector issue also occurs when building with CI-beNNch. But also not here in our CI testsuite.

@heplesser
Copy link
Copy Markdown
Contributor

@JanVogelsang Thanks for the update. Here a few more observations (and a question) from my side:

  • The spin-detector problem does not occur with gcc-13 under macOS nor in my Ubuntu24 VM. I have only seen it on a cluster (gcc 13.3.0 on all systems).
  • With my minimized LTO-detection, things work with GCC 13 on Ubuntu 24. I did not need to set any extra flags.
  • I then did sudo apt-get install llvm clang g++-14 on my U24 VM followed by
cmake -DCMAKE_C_COMPILER=clang-18 -DCMAKE_CXX_COMPILER=clang++-18 -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install  -Dwith-mpi=OFF -Dwith-openmp=OFF -Dwith-hdf5=OFF -Dwith-lto=OFF  --fresh ../../src/jv_3544

with my 3544-branch (ie minimal settings for LTO) and everything built nicely out of the box. The duration of the final linking stage and the size of the so file clearly indicate that LTO has been done.

In file included from /Users/plesser/NEST/code/src/jv_3544/models/bernoulli_synapse.cpp:23:
In file included from /Users/plesser/NEST/code/src/jv_3544/models/bernoulli_synapse.h:28:
In file included from /opt/homebrew/Cellar/llvm/22.1.3/bin/../include/c++/v1/string:654:
In file included from /opt/homebrew/Cellar/llvm/22.1.3/bin/../include/c++/v1/string_view:975:
In file included from /opt/homebrew/Cellar/llvm/22.1.3/bin/../include/c++/v1/algorithm:1875:
In file included from /opt/homebrew/Cellar/llvm/22.1.3/bin/../include/c++/v1/__algorithm/make_heap.h:16:
/opt/homebrew/Cellar/llvm/22.1.3/bin/../include/c++/v1/__algorithm/sift_down.h:55:7: error: no matching function for call to object of type
      'std::less<boost::tuples::tuple<nest::Source, nest::ConnectionLabel<nest::bernoulli_synapse<nest::TargetIdentifierPtrRport>>>>'
   55 |   if (__comp(__first[__child], __first[__start]))
      |       ^~~~~~
/opt/homebrew/Cellar/llvm/22.1.3/bin/../include/c++/v1/__functional/operations.h:358:60: note: candidate function not viable: no known conversion from
      'operator_brackets_proxy<IteratorPair<bv_iterator<nest::Source, nest::Source &, nest::Source *>,
      bv_iterator<nest::ConnectionLabel<nest::bernoulli_synapse<nest::TargetIdentifierPtrRport>>, nest::ConnectionLabel<nest::bernoulli_synapse<nest::TargetIdentifierPtrRport>> &,
      nest::ConnectionLabel<nest::bernoulli_synapse<nest::TargetIdentifierPtrRport>> *>>>' to 'const boost::tuples::tuple<nest::Source,
      nest::ConnectionLabel<nest::bernoulli_synapse<nest::TargetIdentifierPtrRport>>>' for 1st argument

Any idea what might be going on here?

@JanVogelsang
Copy link
Copy Markdown
Contributor Author

@heplesser But in the CMake call you shared you specified "-Dwith-lto=OFF", so are you sure that LTO ran?

For the nest::sort issue, I have no idea, but an LLM suggested replacing the line by nest::sort( sources, C_, std::less<>() );. Maybe try that out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I: Internal API Changes were introduced in basic internal workings of the simulator that developers need to know S: High Should be handled next T: Maintenance Work to keep up the quality of the code and documentation.

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

7 participants