Skip to content

Commit 14d952d

Browse files
committed
Made coarse-resolution large-ensemble MCC, ERI tests work.
Changes are needed in multiple components: cesm, cime, cmeps, mom/MOM6. The branches are labeled with DART_lowres_{component}. cime_config/config_compsets.xml Added BHIST_LT_DART compset. cime_config/testlist_allactive.xml Added coarse resolution ERI and MCC tests for BHIST DART tests cime_config/testmods_dirs/allactive/DART_BHIST_lowres/ Testmod to make coarse resolution BHIST tests work. Must be used with the (new) MOM6 testmod: mom-tx10deg + include_user_mods (<- ./crossleap <- ./defaultio) + shell_commands; fix or sidestep several barriers to successful testing + user_nl_cam; namelist settings required by DART. + user_nl_clm; permit mid-year start dates in nondefault years. + user_nl_mosart; provide a frivinp file which works with the coarse resolution. + README_layout; instructions for setting NINST and MAX_TASKS_PER_NODE to enable running. Possibly not needed in the future.
1 parent aac0bd7 commit 14d952d

8 files changed

Lines changed: 99 additions & 0 deletions

File tree

cime_config/config_compsets.xml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,12 @@
9292
<lname>HISTC_CAM70%LT_CLM60%BGC-CROP_CICE_MOM6_MOSART_DGLC%NOEVOLVE_WW3</lname>
9393
</compset>
9494

95+
<!-- BHIST_LTso, except: WW3 can't handle multi-instance (2026-3),
96+
CROP doesn't work in this context. -->
97+
<compset>
98+
<alias>BHISTC_LT_DART</alias>
99+
<lname>HISTC_CAM70%LT_CLM60%BGC_CICE_MOM6_MOSART_DGLC%NOEVOLVE_SWAV</lname>
100+
</compset>
95101

96102
<!-- Emissions driven compsets for CESM3 -->
97103

cime_config/testlist_allactive.xml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,25 @@
7373
<option name="wallclock"> 02:00:00 </option>
7474
</options>
7575
</test>
76+
<!-- DART Low-res BHIST tests. Needs additional testmods -->
77+
<test name="ERI" grid="ne3pg3_ne3pg3_t232" compset="BHISTC_LT_DART" testmods="allactive/DART_lowres,
78+
components/mom/cime_config/testdefs/testmods_dirs/mom/tx10deg">
79+
<machines>
80+
<machine name="derecho" compiler="intel" />
81+
</machines>
82+
<options>
83+
<option name="wallclock"> 03:00:00 </option>
84+
</options>
85+
</test>
86+
<test name="MCC" grid="ne3pg3_ne3pg3_t232" compset="BHISTC_LT_DART" testmods="allactive/DART_lowres,
87+
components/mom/cime_config/testdefs/testmods_dirs/mom/tx10deg">
88+
<machines>
89+
<machine name="derecho" compiler="intel" />
90+
</machines>
91+
<options>
92+
<option name="wallclock"> 02:00:00 </option>
93+
</options>
94+
</test>
7695
<test name="SMS_Ld2" grid="ne30pg3_t232" compset="B1850C_LTso" testmods="allactive/defaultio">
7796
<machines>
7897
<machine name="derecho" compiler="intel" category="aux_cime_baselines"/>
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
If the coarse resolution (ne3pg3_ne3pg3_t232) is used in a BHIST MCC test,
2+
then the number of instances and MAX_TASKS_PER_NODE must be chosen
3+
in a way that's consistent with how cime set the PE layout.
4+
5+
1) MOM6 coarse resolution is limited (2026-3) to 6 tasks.
6+
The SE dycore is limited to 6 x ne^2 tasks,
7+
so the ne3 grid could be given up to 54 tasks.
8+
drv-tx10deg/shell_commands:ROOTPE_OCN assumes that only 6 will be given
9+
to the other components, and then 6 to ocn, so tasks_per_inst = 12.
10+
If more tasks are given to the other components,
11+
then ROOTPE_OCN must be changed to that number.
12+
2) If tasks_per_inst = 12, NINST <=10, and there are 128 PEs/node,
13+
then the job will fit in 1 node and be run in derecho's develop queue.
14+
The default PE layout will work.
15+
You can stop here.
16+
For NINST > 10 the job requires multiple nodes and the default layout
17+
does not work because at least one instance will be split onto 2 nodes.
18+
During testing these jobs never finished, despite trying their best.
19+
3) The minimum number of nodes required
20+
min_nodes = int[(NINST * tasks_per_inst) / PEs_per_node] +1.
21+
For NINST = 14 on derecho (128 PEs/node),
22+
min_nodes = 2
23+
4) Cime calculates PEs/instance from
24+
PEs_per_inst = (PEs_per_node * min_nodes) / NINST
25+
= 17
26+
5) Whole instances must be assigned to each node.
27+
That is, no instance can be split between 2 nodes.
28+
In this example, put
29+
NINST_per_node = int(NINST / min_nodes)
30+
= 7 instances
31+
on each node by setting
32+
MAX_TASKS_PER_NODE = NINST_per_node * PEs_per_inst
33+
= 7 * 17 = 119.
34+
in ./shell_commands.
35+
6) If your chosen NINST doesn't divide evenly by min_nodes,
36+
then there will be "left over" instances after NINST_per_node * min_nodes
37+
instances have been distributed.
38+
They might fit into the last node.
39+
If they don't, you'll need to add a node to the job request.
40+
Alternatively, change NINST by a small number to make it divisible and try again,
41+
starting at 3).
42+
7) Use NINST in the test modifier _C{NINST} of the test name
43+
44+
Assuming each instance gets 6 tasks for ocn and 6 for the other components:
45+
NINST = 15;
46+
min_nodes = [15 * 12 / 128] + 1 = 2
47+
PEs_per_inst = 128 * 2 / 15 = 17
48+
NINST_per_node = int(15 / 2) = 7 with 1 instance left over
49+
> MAX_TASKS_PER_NODE = 7 * 17 = 119
50+
128 - 119 = 9 PEs are available in the last node,
51+
which is not enough for the leftover instance, so an additional node is needed.
52+
NINST = 40;
53+
min_nodes = 40 * 12 / 128 = 4
54+
PEs_per_inst = 128 * 4 / 40 = 12
55+
NINST_per_node = 40 / 4 = 10
56+
> MAX_TASKS_PER_NODE = 10 * 12 = 120
57+
NINST = 80;
58+
min_nodes = 80 * 12 / 128 = 8
59+
PEs_per_inst = 128 * 8 / 80 = 12
60+
NINST_per_node = 80 / 8 = 10
61+
> MAX_TASKS_PER_NODE = 10 * 12 = 120
62+
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../crossleap
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
driver=`./xmlquery --value COMP_INTERFACE`
2+
if [ "$driver" = "nuopc" ]; then
3+
./xmlchange GLC_NCPL=4
4+
fi
5+
6+
./xmlchange JOB_PRIORITY=premium
7+
./xmlchange MAX_TASKS_PER_NODE=120
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
inithist='ENDOFRUN'
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
check_finidat_year_consistency = .false.
2+
for_testing_allow_non_annual_changes = .true.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
frivinp = "/glade/campaign/cesm/cesmdata/inputdata/rof/mosart/MOSART_routing_Global_0.5x0.5_c170601.nc"

0 commit comments

Comments
 (0)