Remove all references to LSF (#780)

Remove all references to LSF and LSB components, including `JsrunSettings`, `BsubBatchSettings`, and so on. [ committed by @al-rigazzi ]
CrayLabs · Dec 10, 2024 · f509ad6 · f509ad6
1 parent 4701e8c
commit f509ad6
Show file tree

Hide file tree

Showing 45 changed files with 56 additions and 2,706 deletions.
diff --git a/.github/workflows/run_tests.yml b/.github/workflows/run_tests.yml
@@ -55,7 +55,7 @@ jobs:
       fail-fast: false
       matrix:
         subset: [backends, slow_tests, group_a, group_b]
-        os: [macos-12, macos-14, ubuntu-22.04] # Operating systems
+        os: [macos-14, ubuntu-22.04] # Operating systems
         compiler: [8] # GNU compiler version
         rai: [1.2.7] # Redis AI versions
         py_v: ["3.9", "3.10", "3.11"] # Python versions

diff --git a/.wci.yml b/.wci.yml
@@ -10,7 +10,7 @@
       Machine Learning (ML) libraries, like PyTorch and TensorFlow,
       in combination with High Performance Computing (HPC) simulations and applications.
       SmartSim launches ML infrastructure on HPC systems alongside user workloads
-      and supports most HPC workload managers (e.g. Slurm, PBSPro, LSF).
+      and supports most HPC workload managers (e.g. Slurm, PBSPro, SGE).
       SmartSim also provides a set of client libraries in Python, C++, C, and Fortran.
       These client libraries allow users to send and receive data between user
       applications and the machine learning infrastructure.  Moreover, the
@@ -40,7 +40,7 @@
       resource_managers:
         - Slurm
         - PBSPro
-        - LSF
+        - SGE
         - Linux/MacOS
       transfer_protocols:
         - TCP/IP

diff --git a/README.md b/README.md
@@ -144,7 +144,6 @@ SmartSim](https://www.craylabs.org/docs/api/smartsim_api.html#settings).
  - ``MpirunSettings``
  - ``SrunSettings``
  - ``AprunSettings``
- - ``JsrunSettings``
 
 The following example launches a hello world MPI program using the local launcher
 for single compute node, workstations and laptops.
@@ -177,7 +176,7 @@ SmartSim integrates with common HPC schedulers providing batch and interactive
 launch capabilities for all applications:
 
  - Slurm
- - LSF
+ - SGE
  - PBSPro
  - Local (for laptops/single node, no batch)
 
@@ -197,11 +196,9 @@ salloc -N 3 --ntasks-per-node=20 --ntasks 60 --exclusive -t 00:10:00
 # get interactive allocation (PBS)
 qsub -l select=3:ncpus=20 -l walltime=00:10:00 -l place=scatter -I -q <queue>
 
-# get interactive allocation (LSF)
-bsub -Is -W 00:10 -nnodes 3 -P <project> $SHELL
 ```
 
-This same script will run on a SLURM, PBS, or LSF system as the ``launcher``
+This same script will run on a SLURM, PBS, or SGE system as the ``launcher``
 is set to `auto` in the [Experiment](https://www.craylabs.org/docs/api/smartsim_api.html#experiment)
 initialization. The run command like ``mpirun``,
 ``aprun`` or ``srun`` will be automatically detected from what is available on the
@@ -281,7 +278,7 @@ python hello_ensemble.py
 ```
 
 Similar to the interactive example, this same script will run on a SLURM, PBS,
-or LSF system as the ``launcher`` is set to `auto` in the
+or SGE system as the ``launcher`` is set to `auto` in the
 [Experiment](https://www.craylabs.org/docs/api/smartsim_api.html#experiment)
 initialization. Local launching does not support batch workloads.
 
@@ -343,9 +340,6 @@ salloc -N 3 --ntasks-per-node=1 --exclusive -t 00:10:00
 # get interactive allocation (PBS)
 qsub -l select=3:ncpus=1 -l walltime=00:10:00 -l place=scatter -I -q queue
 
-# get interactive allocation (LSF)
-bsub -Is -W 00:10 -nnodes 3 -P project $SHELL
-
 ```
 
 ```python

diff --git a/conftest.py b/conftest.py
@@ -62,7 +62,6 @@
 from smartsim.settings import (
     AprunSettings,
     DragonRunSettings,
-    JsrunSettings,
     MpiexecSettings,
     MpirunSettings,
     PalsMpiexecSettings,
@@ -120,7 +119,7 @@ def print_test_configuration() -> None:
 
 def pytest_configure() -> None:
     pytest.test_launcher = test_launcher
-    pytest.wlm_options = ["slurm", "pbs", "lsf", "pals", "dragon", "sge"]
+    pytest.wlm_options = ["slurm", "pbs", "pals", "dragon", "sge"]
     account = get_account()
     pytest.test_account = account
     pytest.test_device = test_device
@@ -386,15 +385,10 @@ def get_base_run_settings(
             run_args = {"--np": ntasks, "--hostfile": host_file}
             run_args.update(kwargs)
             return RunSettings(exe, args, run_command="mpiexec", run_args=run_args)
-        if test_launcher == "lsf":
-            run_args = {"--np": ntasks, "--nrs": nodes}
-            run_args.update(kwargs)
-            settings = RunSettings(exe, args, run_command="jsrun", run_args=run_args)
-            return settings
         if test_launcher != "local":
             raise SSConfigError(
                 "Base run settings are available for Slurm, PBS, "
-                f"and LSF, but launcher was {test_launcher}"
+                f"and Dragon, but launcher was {test_launcher}"
             )
         # TODO allow user to pick aprun vs MPIrun
         return RunSettings(exe, args)
@@ -429,13 +423,6 @@ def get_run_settings(
             run_args = {"np": ntasks, "hostfile": host_file}
             run_args.update(kwargs)
             return PalsMpiexecSettings(exe, args, run_args=run_args)
-        if test_launcher == "lsf":
-            run_args = {
-                "nrs": nodes,
-                "tasks_per_rs": max(ntasks // nodes, 1),
-            }
-            run_args.update(kwargs)
-            return JsrunSettings(exe, args, run_args=run_args)
 
         return RunSettings(exe, args)
 

diff --git a/doc/api/smartsim_api.rst b/doc/api/smartsim_api.rst
@@ -59,11 +59,9 @@ Types of Settings:
     MpirunSettings
     MpiexecSettings
     OrterunSettings
-    JsrunSettings
     DragonRunSettings
     SbatchSettings
     QsubBatchSettings
-    BsubBatchSettings
 
 Settings objects can accept a container object that defines a container
 runtime, image, and arguments to use for the workload. Below is a list of
@@ -187,41 +185,6 @@ for Slurm and PBS sessions, respectively).
     :members:
 
 
-.. _jsrun_api:
-
-JsrunSettings
--------------
-
-
-``JsrunSettings`` can be used on any system that supports the
-IBM LSF launcher.
-
-``JsrunSettings`` can be used in interactive session (on allocation)
-and within batch launches (i.e. ``BsubBatchSettings``)
-
-
-.. autosummary::
-
-    JsrunSettings.set_num_rs
-    JsrunSettings.set_cpus_per_rs
-    JsrunSettings.set_gpus_per_rs
-    JsrunSettings.set_rs_per_host
-    JsrunSettings.set_tasks
-    JsrunSettings.set_tasks_per_rs
-    JsrunSettings.set_binding
-    JsrunSettings.make_mpmd
-    JsrunSettings.set_mpmd_preamble
-    JsrunSettings.update_env
-    JsrunSettings.set_erf_sets
-    JsrunSettings.format_env_vars
-    JsrunSettings.format_run_args
-
-
-.. autoclass:: JsrunSettings
-    :inherited-members:
-    :undoc-members:
-    :members:
-
 .. _openmpi_run_api:
 
 MpirunSettings
@@ -361,33 +324,6 @@ be launched as a batch on PBSPro systems.
     :members:
 
 
-.. _bsub_api:
-
-BsubBatchSettings
------------------
-
-
-``BsubBatchSettings`` are used to configure jobs that should
-be launched as a batch on LSF systems.
-
-
-.. autosummary::
-
-    BsubBatchSettings.set_walltime
-    BsubBatchSettings.set_smts
-    BsubBatchSettings.set_project
-    BsubBatchSettings.set_nodes
-    BsubBatchSettings.set_expert_mode_req
-    BsubBatchSettings.set_hostlist
-    BsubBatchSettings.set_tasks
-    BsubBatchSettings.format_batch_args
-
-
-.. autoclass:: BsubBatchSettings
-    :inherited-members:
-    :undoc-members:
-    :members:
-
 .. _singularity_api:
 
 Singularity

diff --git a/doc/batch_settings.rst b/doc/batch_settings.rst
@@ -16,8 +16,6 @@ launching capabilities tailored for specific workload managers (WLMs). Each Smar
    - :ref:`SbatchSettings<sbatch_api>`
 - The PBS Pro `launcher` supports:
    - :ref:`QsubBatchSettings<qsub_api>`
-- The LSF `launcher` supports:
-   - :ref:`BsubBatchSettings<bsub_api>`
 
 .. note::
       The local `launcher` does not support batch jobs.
@@ -97,31 +95,6 @@ Below are examples of how to initialize a ``BatchSettings`` object per `launcher
         If `launcher="auto"`, SmartSim will detect that the ``Experiment`` is running on a PBS Pro based
         machine and set the launcher to `"pbs"`.
 
-    .. group-tab:: LSF
-      To instantiate the ``BsubBatchSettings`` object, which interfaces with the LSF job scheduler, specify
-      `launcher="lsf"` when initializing the ``Experiment``. Upon calling ``create_batch_settings``,
-      SmartSim will detect the job scheduler and return the appropriate batch settings object.
-
-        .. code-block:: python
-
-            from smartsim import Experiment
-
-            # Initialize the experiment and provide launcher LSF
-            exp = Experiment("name-of-experiment", launcher="lsf")
-
-            # Initialize a BsubBatchSettings object
-            bsub_batch_settings = exp.create_batch_settings(nodes=1, time="10:00:00", batch_args={"ntasks": 1})
-            # Set the account for the lsf batch job
-            bsub_batch_settings.set_account("12345-Cray")
-            # Set the partition for the lsf batch job
-            bsub_batch_settings.set_queue("default")
-
-      The initialized ``BsubBatchSettings`` instance can now be passed to a SmartSim entity
-      (``Model`` or ``Ensemble``) via the `batch_settings` argument in ``create_batch_settings``.
-
-      .. note::
-        If `launcher="auto"`, SmartSim will detect that the ``Experiment`` is running on a LSF based
-        machine and set the launcher to `"lsf"`.
 
 .. warning::
       Note that initialization values provided (e.g., `nodes`, `time`, etc) will overwrite the same arguments in `batch_args` if present.
diff --git a/doc/changelog.md b/doc/changelog.md
@@ -13,25 +13,31 @@ To be released at some point in the future
 
 Description
 
+- Terminate LSF and LSB support
 - Implement workaround for Tensorflow that allows RedisAI to build with GCC-14
 - Add instructions for installing SmartSim on PML's Scylla
 - Fix typos in documentation
 
 Detailed Notes
 
+- After the supercomputer Summit was decommissioned, a decision was made to
+  terminate SmartSim's support of the LSF launcher and LSB scheduler. If
+  this impacts your work, please contact us.
+  ([SmartSim-PR780](https://github.com/CrayLabs/SmartSim/pull/780))
+- Fix typos in the `train_surrogate` tutorial documentation.
+  ([SmartSim-PR758](https://github.com/CrayLabs/SmartSim/pull/758))
+- PML's Scylla is still under development. The usual SmartSim
+  build instructions do not apply because the GPU dependencies
+  have yet to be installed at a system-wide level. Scylla has
+  its own entry in the documentation.
+  ([SmartSim-PR733](https://github.com/CrayLabs/SmartSim/pull/733))
 - In libtensorflow, the input argument to TF_SessionRun seems to be mistyped to
   TF_Output instead of TF_Input. These two types differ only in name. GCC-14
   catches this and throws an error, even though earlier versions allow this. To
   solve this problem, patches are applied to the Tensorflow backend in RedisAI.
   Future versions of Tensorflow may fix this problem, but for now this seems to be
   the best workaround.
   ([SmartSim-PR738](https://github.com/CrayLabs/SmartSim/pull/738))
-- PML's Scylla is still under development. The usual SmartSim
-  build instructions do not apply because the GPU dependencies
-  have yet to be installed at a system-wide level. Scylla has
-  its own entry in the documentation.
-  ([SmartSim-PR733](https://github.com/CrayLabs/SmartSim/pull/733))
-- Fix typos in the `train_surrogate` tutorial documentation
 
 
 ### 0.8.0

diff --git a/doc/developer.rst b/doc/developer.rst
@@ -90,7 +90,7 @@ If any of the above commands are used, the test suite will run the "light" test
 suite by default.
 
 
-PBSPro, Slurm, LSF
+PBSPro, Slurm, SGE
 ==================
 
 To run the full test suite, users will have to be on a system with one of the
@@ -105,17 +105,14 @@ of at least 3 nodes.
   # for PBSPro (with aprun)
   qsub -l select=3 -l place=scatter -l walltime=00:10:00 -q queue
 
-  # for LSF (with jsrun)
-  bsub -Is -W 00:30 -nnodes 3 -P project $SHELL
-
 Values for queue, account, or project should be substituted appropriately.
 
 Once in an iterative allocation, users will need to set the test launcher
 environment variable: ``SMARTSIM_TEST_LAUNCHER`` to one of the following values
 
  - slurm
  - pbs
- - lsf
+ - sge
  - local
 
 If tests have to run on an account or project, the environment variable

diff --git a/doc/experiment.rst b/doc/experiment.rst
@@ -7,7 +7,7 @@ Overview
 SmartSim helps automate the deployment of AI-enabled workflows on HPC systems. With SmartSim, users
 can describe and launch combinations of applications and AI/ML infrastructure to produce novel and
 scalable workflows. SmartSim supports launching these workflows on a diverse set of systems, including
-local environments such as Mac or Linux, as well as HPC job schedulers (e.g. Slurm, PBS Pro, and LSF).
+local environments such as Mac or Linux, as well as HPC job schedulers (e.g. Slurm, PBS Pro, and SGE).
 
 The ``Experiment`` API is SmartSim's top level API that provides users with methods for creating, combining,
 configuring, launching and monitoring :ref:`entities<entities_exp_docs>` in an AI-enabled workflow. More specifically, the
@@ -49,7 +49,7 @@ workflow in the :ref:`Example<exp_example>` section of this page.
 Launchers
 =========
 SmartSim supports launching AI-enabled workflows on a wide variety of systems, including locally on a Mac or
-Linux machine or on HPC machines with a job scheduler (e.g. Slurm, PBS Pro, and LSF). When creating a SmartSim
+Linux machine or on HPC machines with a job scheduler (e.g. Slurm, PBS Pro, and SGE). When creating a SmartSim
 ``Experiment``, the user has the opportunity to specify the `launcher` type or defer to automatic `launcher` selection.
 `Launcher` selection determines how SmartSim translates entity configurations into system calls to launch,
 manage, and monitor. Currently, SmartSim supports 7 `launcher` options:
@@ -58,7 +58,7 @@ manage, and monitor. Currently, SmartSim supports 7 `launcher` options:
 2. ``slurm``: for systems using the Slurm scheduler
 3. ``pbs``: for systems using the PBS Pro scheduler
 4. ``pals``: for systems using the PALS scheduler
-5. ``lsf``: for systems using the LSF scheduler
+5. ``sge``: for systems using the SGE scheduler
 6. ``dragon``: if Dragon is installed in the current Python environment, see :ref:`Dragon Install <dragon_install>`
 7. ``auto``: have SmartSim auto-detect the launcher to use (will not detect ``dragon``)
 

diff --git a/doc/overview.rst b/doc/overview.rst
@@ -61,7 +61,7 @@ The key features of the IL are:
   - An API to start, monitor, and stop HPC jobs from Python or from a Jupyter notebook.
   - Automated deployment of in-memory data staging (`Redis <https://redis.io>`_) and computational
     storage (`RedisAI <https://redisai.io>`_).
-  - Programmatic launches of batch and in-allocation jobs on PBS, Slurm, and LSF systems.
+  - Programmatic launches of batch and in-allocation jobs on PBS, Slurm, and SGE systems.
   - Creating and configuring ensembles of workloads with isolated communication channels.
 
 The IL can configure and launch batch jobs as well as jobs within interactive