restore many guides, uses of Summit. Mention that its decomissioned.

jlnav · jlnav · commit 23a16d079d1a · 2025-02-26T10:43:50.000-06:00
diff --git a/docs/advanced_installation.rst b/docs/advanced_installation.rst
@@ -49,6 +49,10 @@ Further recommendations for selected HPC systems are given in the
 
             MPICC=mpiicc pip install mpi4py --no-binary mpi4py
 
+        On Summit, the following line is recommended (with gcc compilers)::
+
+            CC=mpicc MPICC=mpicc pip install mpi4py --no-binary mpi4py
+
     .. tab-item:: conda
 
         Install libEnsemble with Conda_ from the conda-forge channel::
@@ -112,6 +116,12 @@ Further recommendations for selected HPC systems are given in the
 
             spack info py-libensemble
 
+        On some platforms you may wish to run libEnsemble without ``mpi4py``,
+        using a serial PETSc build. This is often preferable if running on
+        the launch nodes of a three-tier system (e.g., Summit)::
+
+            spack install py-libensemble +scipy +mpmath +petsc4py ^py-petsc4py~mpi ^petsc~mpi~hdf5~hypre~superlu-dist
+
         The installation will create modules for libEnsemble and the dependent
         packages. These can be loaded by running::
 
diff --git a/docs/known_issues.rst b/docs/known_issues.rst
@@ -19,6 +19,8 @@ may occur when using libEnsemble.
 * Local comms mode (multiprocessing) may fail if MPI is initialized before
   forking processors. This is thought to be responsible for issues combining
   multiprocessing with PETSc on some platforms.
+* Remote detection of logical cores via ``LSB_HOSTS`` (e.g., Summit) returns the
+  number of physical cores as SMT info not available.
 * TCP mode does not support
   (1) more than one libEnsemble call in a given script or
   (2) the auto-resources option to the Executor.
diff --git a/docs/nitpicky b/docs/nitpicky
@@ -46,7 +46,7 @@ py:class libensemble.resources.platforms.Perlmutter
 py:class libensemble.resources.platforms.PerlmutterCPU
 py:class libensemble.resources.platforms.PerlmutterGPU
 py:class libensemble.resources.platforms.Polaris
-py:class libensemble.resources.platforms.Sunspot
+py:class libensemble.resources.platforms.Summit
 py:class libensemble.resources.rset_resources.RSetResources
 py:class libensemble.resources.env_resources.EnvResources
 py:class libensemble.resources.resources.Resources
diff --git a/docs/platforms/example_scripts.rst b/docs/platforms/example_scripts.rst
@@ -28,3 +28,9 @@ information about the respective systems and configuration.
     ..  literalinclude:: ../../examples/libE_submission_scripts/bebop_submit_slurm_distrib.sh
         :caption: /examples/libE_submission_scripts/bebop_submit_slurm_distrib.sh
         :language: bash
+
+.. dropdown:: Summit (Decomissioned) - On Launch Nodes with Multiprocessing
+
+    ..  literalinclude:: ../../examples/libE_submission_scripts/summit_submit_mproc.sh
+        :caption: /examples/libE_submission_scripts/summit_submit_mproc.sh
+        :language: bash
diff --git a/docs/platforms/platforms_index.rst b/docs/platforms/platforms_index.rst
@@ -80,6 +80,26 @@ per worker, and adding the manager onto the first node.
 HPC systems that only allow one application to be launched to a node at any one time,
 will not allow a distributed configuration.
 
+Systems with Launch/MOM Nodes
+-----------------------------
+
+Some large systems have a 3-tier node setup. That is, they have a separate set of launch nodes
+(known as MOM nodes on Cray Systems). User batch jobs or interactive sessions run on a launch node.
+Most such systems supply a special MPI runner that has some application-level scheduling
+capability (e.g., ``aprun``, ``jsrun``). MPI applications can only be submitted from these nodes. Examples
+of these systems include Summit and Sierra.
+
+There are two ways of running libEnsemble on these kinds of systems. The first, and simplest,
+is to run libEnsemble on the launch nodes. This is often sufficient if the worker's simulation
+or generation functions are not doing much work (other than launching applications). This approach
+is inherently centralized. The entire node allocation is available for the worker-launched
+tasks.
+
+However, running libEnsemble on the compute nodes is potentially more scalable and
+will better manage simulation and generation functions that contain considerable
+computational work or I/O. Therefore the second option is to use proxy task-execution
+services like Balsam_.
+
 Balsam - Externally Managed Applications
 ----------------------------------------
 
@@ -190,11 +210,13 @@ libEnsemble on specific HPC systems.
     :titlesonly:
 
     aurora
+    bebop
     frontier
+    improv
     perlmutter
     polaris
-    bebop
-    improv
+    spock_crusher
+    summit
     srun
     example_scripts
 
diff --git a/docs/platforms/summit.rst b/docs/platforms/summit.rst
@@ -0,0 +1,158 @@
+======================
+Summit (Decomissioned)
+======================
+
+Summit_ was an IBM AC922 system located at the Oak Ridge Leadership Computing
+Facility (OLCF). Each of the approximately 4,600 compute nodes on Summit contained two
+IBM POWER9 processors and six NVIDIA Volta V100 accelerators.
+
+Summit featured three tiers of nodes: login, launch, and compute nodes.
+
+Users on login nodes submit batch runs to the launch nodes.
+Batch scripts and interactive sessions run on the launch nodes. Only the launch
+nodes can submit MPI runs to the compute nodes via ``jsrun``.
+
+These docs are maintained to guide libEnsemble's usage on three-tier systems similar to Summit.
+
+Special note on resource sets and Executor submit options
+---------------------------------------------------------
+
+When using the portable MPI run configuration options (e.g., num_nodes) to the
+:doc:`MPIExecutor<../executor/mpi_executor>` ``submit`` function, it is important
+to note that, due to the `resource sets`_ used on Summit, the options refer to
+resource sets as follows:
+
+- num_procs (int, optional) – The total number resource sets for this run.
+
+- num_nodes (int, optional) – The number of nodes on which to submit the run.
+
+- procs_per_node (int, optional) – The number of resource sets per node.
+
+It is recommended that the user defines a resource set as the minimal configuration
+of CPU cores/processes and GPUs. These can be added to the ``extra_args`` option
+of the *submit* function. Alternatively, the portable options can be ignored and
+everything expressed in ``extra_args``.
+
+For example, the following *jsrun* line would run three resource sets,
+each having one core (with one process), and one GPU, along with some extra options::
+
+    jsrun -n 3 -a 1 -g 1 -c 1 --bind=packed:1 --smpiargs="-gpu"
+
+To express this line in the ``submit`` function may look
+something like the following::
+
+    exctr = Executor.executor
+    task = exctr.submit(app_name="mycode",
+                        num_procs=3,
+                        extra_args="-a 1 -g 1 -c 1 --bind=packed:1 --smpiargs="-gpu""
+                        app_args="-i input")
+
+This would be equivalent to::
+
+    exctr = Executor.executor
+    task = exctr.submit(app_name="mycode",
+                        extra_args="-n 3 -a 1 -g 1 -c 1 --bind=packed:1 --smpiargs="-gpu""
+                        app_args="-i input")
+
+The libEnsemble resource manager works out the resources available to each worker,
+but unlike some other systems, ``jsrun`` on Summit dynamically schedules runs to
+available slots across and within nodes. It can also queue tasks. This allows variable
+size runs to easily be handled on Summit. If oversubscription to the `jsrun` system
+is desired, then libEnsemble's resource manager can be disabled in the
+calling script via::
+
+    libE_specs["disable_resource_manager"] = True
+
+In the above example, the task being submitted used three GPUs, which is half those
+available on a Summit node, and thus two such tasks may be allocated to each node
+(from different workers), if they were running at the same time.
+
+Job Submission
+--------------
+
+Summit used LSF_ for job management and submission. For libEnsemble, the most
+important command is ``bsub`` for submitting batch scripts from the login nodes
+to execute on the launch nodes.
+
+It is recommended to run libEnsemble on the launch nodes (assuming workers are
+submitting MPI applications) using the ``local`` communications mode (multiprocessing).
+
+Interactive Runs
+^^^^^^^^^^^^^^^^
+
+You can run interactively with ``bsub`` by specifying the ``-Is`` flag,
+similarly to the following::
+
+    $ bsub -W 30 -P [project] -nnodes 8 -Is
+
+This will place you on a launch node.
+
+.. note::
+    You will need to reactivate your conda virtual environment.
+
+Batch Runs
+^^^^^^^^^^
+
+Batch scripts specify run settings using ``#BSUB`` statements. The following
+simple example depicts configuring and launching libEnsemble to a launch node with
+multiprocessing. This script also assumes the user is using the ``parse_args()``
+convenience function from libEnsemble's :doc:`tools module<../utilities>`.
+
+.. code-block:: bash
+
+    #!/bin/bash -x
+    #BSUB -P <project code>
+    #BSUB -J libe_mproc
+    #BSUB -W 60
+    #BSUB -nnodes 128
+    #BSUB -alloc_flags "smt1"
+
+    # --- Prepare Python ---
+
+    # Load conda module and gcc.
+    module load python
+    module load gcc
+
+    # Name of conda environment
+    export CONDA_ENV_NAME=my_env
+
+    # Activate conda environment
+    export PYTHONNOUSERSITE=1
+    source activate $CONDA_ENV_NAME
+
+    # --- Prepare libEnsemble ---
+
+    # Name of calling script
+    export EXE=calling_script.py
+
+    # Communication Method
+    export COMMS="--comms local"
+
+    # Number of workers.
+    export NWORKERS="--nworkers 128"
+
+    hash -r # Check no commands hashed (pip/python...)
+
+    # Launch libE
+    python $EXE $COMMS $NWORKERS > out.txt 2>&1
+
+With this saved as ``myscript.sh``, allocating, configuring, and queueing
+libEnsemble on Summit is achieved by running ::
+
+    $ bsub myscript.sh
+
+Example submission scripts are also given in the :doc:`examples<example_scripts>`.
+
+Launching User Applications from libEnsemble Workers
+----------------------------------------------------
+
+Only the launch nodes can submit MPI runs to the compute nodes via ``jsrun``.
+This can be accomplished in user simulator functions directly. However, it is highly
+recommended that the :doc:`Executor<../executor/ex_index>` interface
+be used inside the simulator or generator, because this provides a portable interface
+with many advantages including automatic resource detection, portability,
+launch failure resilience, and ease of use.
+
+.. _conda: https://conda.io/en/latest/
+.. _LSF: https://www.olcf.ornl.gov/wp-content/uploads/2018/12/summit_workshop_fuson.pdf
+.. _mpi4py: https://mpi4py.readthedocs.io/en/stable/
diff --git a/docs/running_libE.rst b/docs/running_libE.rst
@@ -66,6 +66,10 @@ supercomputers.
         from app-launches (if running libEnsemble on a compute node),
         set ``libE_specs["dedicated_mode"] = True``.
 
+        This mode can also be used to run on a **launch** node of a three-tier
+        system (e.g., Summit), ensuring the whole compute-node allocation is available for
+        launching apps. Make sure there are no imports of ``mpi4py`` in your Python scripts.
+
         Note that on macOS (since Python 3.8) and Windows, the default multiprocessing method
         is ``"spawn"`` instead of ``"fork"``; to resolve many related issues, we recommend placing
         calling script code in an ``if __name__ == "__main__":`` block.
@@ -100,6 +104,9 @@ supercomputers.
         (see :doc:`Balsam<executor/balsam_2_executor>`). This nesting does work
         with MPICH_ and its derivative MPI implementations.
 
+        It is also unsuitable to use this mode when running on the **launch** nodes of
+        three-tier systems (e.g., Summit). In that case ``local`` mode is recommended.
+
     .. tab-item:: TCP Comms
 
         Run the Manager on one system and launch workers to remote
diff --git a/libensemble/resources/platforms.py b/libensemble/resources/platforms.py
@@ -153,6 +153,16 @@ class Frontier(Platform):
     scheduler_match_slots: bool = False
 
 
+class Summit(Platform):
+    mpi_runner: str = "jsrun"
+    cores_per_node: int = 42
+    logical_cores_per_node: int = 168
+    gpus_per_node: int = 6
+    gpu_setting_type: str = "option_gpus_per_task"
+    gpu_setting_name: str = "-g"
+    scheduler_match_slots: bool = False
+
+
 # Example of a ROCM system
 class GenericROCm(Platform):
     mpi_runner: str = "mpich"
@@ -236,13 +246,15 @@ class Known_platforms(BaseModel):
     perlmutter_c: PerlmutterCPU = PerlmutterCPU()
     perlmutter_g: PerlmutterGPU = PerlmutterGPU()
     polaris: Polaris = Polaris()
+    summit: Summit = Summit()
 
 
 # Dictionary of known systems (or system partitions) detectable by domain name
 detect_systems = {
     "frontier.olcf.ornl.gov": "frontier",
     "hostmgmt.cm.aurora.alcf.anl.gov": "aurora",
     "hsn.cm.polaris.alcf.anl.gov": "polaris",
+    "summit.olcf.ornl.gov": "summit",  # Need to detect gpu count
 }