Skip to content

CUPID Config

will wieder edited this page Feb 27, 2026 · 12 revisions

CUPiD Configuration

While we’re waiting for ILAMB and LDF to run, let’s revisit the config.yml file -- it’s broken into a few distinct sections.

data_sources Section

The first section in the config.yml file is data_sources:

################
# Data Sources #
################
data_sources:
    # run_dir is the path to the folder you want
    ### all the files associated with this configuration
    ### to be created in
    run_dir: .

    # nb_path_root is the path to the folder that cupid will
    ### look for your template notebooks in. It doesn't have to
    ### be inside run_dir, or be specific to this project, as
    ### long as the notebooks are there
    nb_path_root: ../../nblibrary

This typically does not need to be edited by the user, and may be removed in favor of command-line arguments to the cupid-diagnostics script. It points CUPiD to the notebook library and also tells CUPiD where to execute the notebooks (we want the notebooks to be run in the output directory rather than the nblibrary directory).

computation_config Section

Much like the data_sources section, this section typically does not need to be modified by users and may turn into command-line arguments. It provides the name of the conda environment to run notebooks in by default (users can specify different environments for individual notebooks), and it also sets logging information:

######################
# Computation Config #
######################

computation_config:

    # default_kernel_name is the name of the environment that
    ### the notebooks in this configuration will be run in by default.
    ### It must already be installed on your machine. You can also
    ### specify a different environment than the default for any
    ### notebook in NOTEBOOK CONFIG
    default_kernel_name: cupid-analysis

    # log level sets the level of how verbose logging will be.
    # options include: debug, info, warning, error
    log_level: 'info'

global_params Section

There are some parameters that are passed to every notebook. These are typically variables associated with the runs being compared (things like CESM case names, location of data, length of the run, and so on).

# All parameters under global_params get passed to all the notebooks

global_params:
  case_name: 'ctsm5.4.002_clm6_BGCcrop_crujra_4x5_HIST'
  base_case_name: 'ctsm5.4.002_clm5_BGCcrop_gswp3_4x5_HIST'
  case_nickname: 'clm5.4.002 crujra'
  base_case_nickname: 'clm5.0 gswp3'
  CESM_output_dir: /glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing
  base_case_output_dir: /glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing
  start_date: '1995-01-01'
  end_date: '2023-01-01'
  climo_start_year: 1995
  climo_end_year: 2014
  base_start_date: '1995-01-01'
  base_end_date: '2014-01-01'
  base_climo_start_year: 1995
  base_climo_end_year: 2014
  obs_data_dir: '/glade/campaign/cesm/development/cross-wg/diagnostic_framework/CUPiD_obs_data'
  ts_dir: null # If this is set to null, it will default to CESM_output_dir; if you don't have permissions to write to CESM_output_dir, you can specify a directory such as your scratch arcive directory
  lc_kwargs:
    threads_per_worker: 1

time_series Section

One of the data standardization tasks CUPiD does is converting CESM history files to time series files (rather than have many variables at a single time level, these files are a single variable at many time levels). The LDF generates its own time series files (though in the future we would prefer to rely on CUPiD's tool for consistency), and ILAMB can read history files, so we won’t spend much time discussing it. Also, the interface for this section is still under development, and will likely change in the near future -- for example, we should be able to use the case names from global_params instead of specifying case_name here.

timeseries:
  num_procs: 8
  ts_done: [False, False]
  overwrite_ts: [False, False]
  case_name: ['ctsm5.4.002_clm6_BGCcrop_crujra_4x5_HIST','ctsm5.4.002_clm5_BGCcrop_gswp3_4x5_HIST']
  file_mode: 664
  dir_mode: 775
  file_group: cesm
  dir_group: cesm

  lnd:
    vars: []
    derive_vars: []
    hist_str: 'clm2.h0a'
    start_years: [1995,1995]
    end_years: [2023,2014]
    level: 'lev'

Want to generate a timeseries?

Note that the timeseries output directory ts_dir is not instantiated in the global_params section of this example. You are able to create timeseries files, but you are not able to save them to the CESM_output_dir as you normally would because we only have read permissions there.

If you want to run the timeseries tool, set ts_dir: /glade/derecho/scratch/${USER}/archive (or another directory you have write access to), and a variable or two to vars and then run

cupid-timeseries

compute_notebooks Section

This section tells CUPiD what notebooks to run, and what parameters should be passed to that notebook in addition to the ones listed in global_params. CUPiD will always run the infrastructure section, and the user can specify what components (atm, ocn, lnd, etc) should also be run. By default, CUPiD will run all the notebooks in this section.

The first key under each component (e.g. global_discharge_gauge_compare_obs in the runoff section) is the name of a notebook, and CUPiD will look in nblibrary/{component} for that file. In this example, CUPiD will run nblibrary/rof/global_discharge_gauge_compare_obs.ipynb as well as another runoff notebook and two land notebooks. As you can see, you can provide more than one notebook per component.

compute_notebooks:

  # This is where all the notebooks you want run and their
  # parameters are specified. Several examples of different
  # types of notebooks are provided.

  # The first key (here infrastructure) is the name of the
  # notebook from nb_path_root, minus the .ipynb

    infrastructure:
      index:
        parameter_groups:
          none: {}

    rof:
      global_discharge_gauge_compare_obs:
        parameter_groups:
          none:
            hist_str: 'h0a'             # file tag, 'h0' or 'h0a'
            analysis_name: ""
            grid_name: 'f09_f09_mosart' # ROF grid name
            climo_nyears: 10
            figureSave: True
      global_discharge_ocean_compare_obs:
        parameter_groups:
          none:
            hist_str: 'h0a'             # file tag, 'h0' or 'h0a'
            analysis_name: ""
            grid_name: 'f09_f09_mosart' # ROF grid name
            climo_nyears: 10
            figureSave: True


    lnd:
      #Global_TerrestrialCouplingIndex_VisualCompareObs:
      #  parameter_groups:
      #    none:
      #      clmFile_h: 'clm2.h0a'
      #      fluxnet_comparison: True
      #      obsDir: 'lnd/analysis_datasets/ungridded/timeseries/FLUXNET2015/'
      ILAMB:
        parameter_groups:
          none:
            ilamb_root: ../../examples/land_only/ILAMB_output
            key_plots: ["EcosystemandCarbonCycle/GrossPrimaryProductivity/FLUXCOM/*_global_bias.png",
                        "EcosystemandCarbonCycle/LeafAreaIndex/AVHRR/*1_global_bias.png"]
            print_table: True
        external_tool:
          tool_name: 'ILAMB'
          ilamb_config_data_loc: '/glade/campaign/cesm/community/lmwg/diag/ILAMB/'
      LDF:
        kernel_name: cupid-analysis
        parameter_groups:
          none:
            ldf_root: ../../examples/land_only/LDF_output/
            key_plots: ["GPP_ANN_LatLon_Mean.png",
                        "ELAI_ANN_LatLon_Mean.png",
                        "RegionalClimo_Amazonia_RegionalClimo_Mean.png"]
        external_tool:
          tool_name: 'LDF'
          vars: ['ELAI','ET','QRUNOFF_TO_COUPLER','GPP']
          plotting_scripts:  ["global_latlon_map","polar_map",
                              "global_mean_timeseries_lnd", "regional_climatology",
                              "regional_timeseries"]
          analysis_scripts: ["lmwg_table"]
          base_regridded_output: False
          region_list: ["Global",'CONUS','Amazonia']
          defaults_file: ../../externals/LDF/lib/ldf_variable_defaults.yaml
          regions_file: ../../externals/LDF/lib/regions_lnd.yaml

book_toc Section

After running all the notebooks specified in compute_notebooks, CUPiD can use Jupyter Book to create a website. Unfortunately there is not a great way to view HTML files that are stored on the NCAR super computers, so for this tutorial we will copy the HTML pages and images to your local desktop and look at them there. There are also several options for looking at the output of the notebooks that CUPiD ran directly (JupyterHub, VS Code + the remote-ssh extension, etc) but we won't demonstrate that.

To build the website, however, the book_toc section lays out how to organize the notebooks into different chapters. Our examples organize the pages by component, but in other cases it may make sense to group notebooks differently (e.g. global surface plots in one section, time series plots of global means in another).

##################################
# Jupyter Book Table of Contents #
##################################
book_toc:

  # See https://jupyterbook.org/en/stable/structure/configure.html for
  # complete documentation of Jupyter book construction options

  format: jb-book

  # All filenames are notebook filename without the .ipynb, similar to above

  root: infrastructure/index # root is the notebook that will be the homepage for the book
  parts:

    # Parts group notebooks into different sections in the Jupyter book
    # table of contents, so you can organize different parts of your project.
    # Each chapter is the name of one of the notebooks that you executed
    # in compute_notebooks above, also without .ipynb

    - caption: Land
      chapters:
        #- file: lnd/Global_TerrestrialCouplingIndex_VisualCompareObs
        - file: lnd/ILAMB
        - file: lnd/LDF

    - caption: River Runoff
      chapters:
        - file: rof/global_discharge_gauge_compare_obs
        - file: rof/global_discharge_ocean_compare_obs

book_config_keys Section

This section is used to set the title of the Jupyter Book webpage. It should probably be combined with the book_toc section, or maybe it should be a command line argument instead.

#####################################
# Keys for Jupyter Book _config.yml #
#####################################
book_config_keys:

  title: CLM Key Metrics   # Title of your jupyter book

Checkpoint #5

At this point you opened up an interactive session on Casper and started to run LDF and ILAMB to process CTSM ouput files that were created in advance. If it is still running for the majority of you, we will move on to providing details of what LDF and ILAMB are doing. If it has finished, though, the next step will be running the CUPiD commands needed to generate the webpage:

  • First exit your interactive casper session
exit

[NOTE]: Exiting the Casper interactive shell will return you back to your login node It will also remove any conda environments you activated, or any environment variable settings you did while in the shell

otherwise the session will remain active and prevent other users from using the processors that are now idle! When you are back on a login node, run cupid-diagnostics and cupid-webpage (in that order):

conda activate cupid-infrastructure
cupid-diagnostics -lnd
cupid-webpage

[NOTE]: If the above fails you may need to do the following:

conda activate cupid-analysis
python -m ipykernel install --user --name=cupid-analysis

(This is from http://ncar.github.io/CUPiD/#note)

Note: you can always remove a computed notebook & re-run cupid-diagnostics to get an updated version, e.g. if you run this before the LDF output exists and you want to rerun cupid daignostics and webpage with LDF and ILAMB together


Viewing the CUPiD webpage

The easiest way to look at the CUPiD webpage will be to copy it to your laptop:

  • Open a terminal locally
  • Move to your Desktop
  • Set your username on derecho in the export command, and then use scp to copy data from a remote machine
export DUSER=<derecho user name>
scp -r $DUSER@derecho.hpc.ucar.edu:/glade/work/$DUSER/ctsm_cupid_2026/CTSM/tools/CUPiD/examples/land_only/computed_notebooks/_build/html CTSM_2026_CUPiD_webpage

To view the webpage, open the CTSM_2026_CUPiD_webpage directory on your desktop and then open index.html.

If you want to also save your notebooks from this tutorial, then run this command instead:

scp -r $DUSER@derecho.hpc.ucar.edu:/glade/work/$DUSER/ctsm_cupid_2026/CTSM/tools/CUPiD/examples/land_only/computed_notebooks .

Note that index.html will be nested under the _build/html directory inside computed_notebooks rather than in the top level of CTSM_2026_CUPiD_webpage.


Congratulations you've run LDF and ILAMB in CUPiD!

Checkpoint #6

At this point, you should be able to look at the CUPiD-generated webpage on your laptop:

image

Again, we might need to come back to these steps after learning more about LDF and ILAMB.

Note there are known issues with seeing colors on the ILAMB landing page. Below are instructions that can help get around the issue:

  • go into the output folder where your index.html file is
  • type python3 -m http.server

    it should show something like this: Serving HTTP on :: port 8000 (http://[::]:8000/) ...

  • Copy paste that (http://[::]:8000/) into a web browser, and you should be able to see the colors


Next running CUPID with a CTSM case submission

Clone this wiki locally