Skip to content

Conversation

@cyrush
Copy link
Member

@cyrush cyrush commented Jan 9, 2026

No description provided.

@cyrush
Copy link
Member Author

cyrush commented Jan 9, 2026

==> Added repo to config with name 'conduit'.
[adding spack package: 'uberenv-conduit@develop%gcc~mpi~hdf5~silo~parmetis~adios~python~doc']
[exe: /home/user/conduit/uberenv_libs/spack/bin/spack -D /home/user/conduit/uberenv_libs/spack_env add 'uberenv-conduit@develop%gcc~mpi~hdf5~silo~parmetis~adios~python~doc']
==> Adding uberenv-conduit@develop %gcc~adios~doc~hdf5~mpi~parmetis~python~silo to environment /home/user/conduit/uberenv_libs/spack_env
[concretizing spack env]
[exe: /home/user/conduit/uberenv_libs/spack/bin/spack -D /home/user/conduit/uberenv_libs/spack_env concretize --fresh ]
==> Compilers have been configured automatically from PATH inspection
==> Error: No such variant {'python', 'doc', 'silo', 'hdf5', 'adios', 'parmetis', 'mpi'} for spec: 'gcc~adios~doc~hdf5~mpi~parmetis~python~silo'
[spack version: 1.1.0 (0c2be44e4ece21eb091ad5de4c97716b7c6d4c87)
]
[exe: /home/user/conduit/uberenv_libs/spack/bin/spack -D /home/user/conduit/uberenv_libs/spack_env spec --fresh --install-status --very-long]
==> Error: No such variant {'mpi', 'python', 'silo', 'adios', 'parmetis', 'hdf5', 'doc'} for spec: 'gcc~adios~doc~hdf5~mpi~parmetis~python~silo'

[exe: /home/user/conduit/uberenv_libs/spack/bin/spack -D /home/user/conduit/uberenv_libs/spack_env -k install --fail-fast --fresh ]
==> Warning: You asked for --insecure. Will NOT check SSL certificates.
==> Error: No such variant {'mpi', 'silo', 'python', 'adios', 'hdf5', 'parmetis', 'doc'} for spec: 'gcc~adios~doc~hdf5~mpi~parmetis~python~silo'

It looks like the spec is getting thrashed some how. It should not be gcc as the package name.

Answer:

New spack needs compiler at the end of a spec.

@cyrush
Copy link
Member Author

cyrush commented Jan 13, 2026

For hdf5 2.0, the source tarball on HDF5 Group website has a different shasum than the source tarball on github.

https://support.hdfgroup.org/releases/hdf5 tarball:

 curl -O https://support.hdfgroup.org/releases/hdf5/v2_0/v2_0_0/downloads/hdf5-2.0.0.tar.gz 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 40.0M  100 40.0M    0     0  10.5M      0  0:00:03  0:00:03 --:--:-- 10.5M
$ shasum -a 256 hdf5-2.0.0.tar.gz 
6e45a4213cb11bb5860e1b0a7645688ab55562cc2d55c6ff9bcb0984ed12b22b  hdf5-2.0.0.tar.gz

github tarball:

$ curl -LO https://github.com/HDFGroup/hdf5/releases/download/2.0.0/hdf5-2.0.0.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 40.0M  100 40.0M    0     0  9321k      0  0:00:04  0:00:04 --:--:-- 9896k
$ shasum -a 256 hdf5-2.0.0.tar.gz 
f4c2edc5668fb846627182708dbe1e16c60c467e63177a75b0b9f12c19d7efed  hdf5-2.0.0.tar.gz

zfp_cmake_dir=${zfp_install_dir}/lib64/cmake/zfp/
else
zfp_cmake_dir=${zfp_install_dir}/lib/cmake/zfp/
fi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will resolve #1499

@cyrush
Copy link
Member Author

cyrush commented Jan 21, 2026

windows build_conduit -- umpire smoke test is failing:

    10/116 Test  #10: t_umpire_smoke .............................Exit code 0xc0000135
  ***Exception:   0.11 sec

I think it is failing to find a dll. Not sure which one is missing yet ...

@cyrush
Copy link
Member Author

cyrush commented Jan 22, 2026

Updating to new TPLs with spack yields two errors:

In t_python_conduit_node:

64: ======================================================================
64: ERROR: test_set_external_with_buffer (t_python_conduit_node.Test_Conduit_Node.test_set_external_with_buffer)
64: ----------------------------------------------------------------------
64: SystemError: Objects/abstract.c:430: bad argument to internal function
64: 
64: The above exception was the direct cause of the following exception:
64: 
64: Traceback (most recent call last):
64:   File "/home/user/conduit/src/tests/conduit/python/t_python_conduit_node.py", line 699, in test_set_external_with_buffer
64:     n.set_external(s_compact,ra)
64:     ~~~~~~~~~~~~~~^^^^^^^^^^^^^^
64: SystemError: <method 'set_external' of 'Node' objects> returned a result with an exception set

This is using:

Python 3.13.11 (main, Jan 22 2026, 00:08:06) [GCC 11.4.0] on linux

In t_relay_io_silo:

96: [ RUN      ] conduit_relay_io_silo.round_trip_save_option_nameschemes_n_files_n_domains
96: [/home/user/conduit/src/libs/relay/conduit_relay_io_silo.cpp : 8344]
96:  Silo save: Overlink: topo name not provided or not found.
96: [/home/user/conduit/src/libs/relay/conduit_relay_io_silo.cpp : 8352]
96:  Silo save: Overlink: topo name defaulting to topo
96: unknown file: Failure
96: C++ exception with description "
96: file: /home/user/conduit/src/libs/conduit/conduit_node.cpp
96: line: 15028
96: message: 
96: Invalid child index: 0 (number of children: 0)
96: " thrown in the test body.
96: 
96: [  FAILED  ] conduit_relay_io_silo.round_trip_save_option_nameschemes_n_files_n_domains (61 ms)

96: [  FAILED  ] conduit_relay_io_silo.round_trip_save_option_nameschemes_n_files_n_domains
96: 
96:  1 FAILED TEST
1/1 Test #96: t_relay_io_silo ..................***Failed   42.69 sec

@cyrush
Copy link
Member Author

cyrush commented Jan 22, 2026

Silo issue, seems a load is failing (maybe the write is failing as well)

load_mesh number of children: 0
save_mesh number of children: 5
save mesh child idx: 0

Catchpoint 1 (exception thrown), 0x00007f12b13a94a1 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt 
#0  0x00007f12b13a94a1 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x0000564ff817a991 in conduit::utils::default_error_handler (msg="Invalid child index: 0 (number of children: 0)", 
    file="/home/user/conduit/src/libs/conduit/conduit_node.cpp", line=15028)
    at /home/user/conduit/src/libs/conduit/conduit_utils.cpp:433
#2  0x0000564ff817aa11 in conduit::utils::handle_error (msg="Invalid child index: 0 (number of children: 0)", 
    file="/home/user/conduit/src/libs/conduit/conduit_node.cpp", line=15028)
    at /home/user/conduit/src/libs/conduit/conduit_utils.cpp:466
#3  0x0000564ff813ff4f in conduit::Node::child (this=0x7ffefd9981b0, idx=0)
    at /home/user/conduit/src/libs/conduit/conduit_node.cpp:15028
#4  0x0000564ff8140395 in conduit::Node::operator[] (this=0x7ffefd9981b0, idx=0)
    at /home/user/conduit/src/libs/conduit/conduit_node.cpp:15102
#5  0x0000564ff8071864 in conduit_relay_io_silo_round_trip_save_option_nameschemes_n_files_n_domains_Test::TestBody (
    this=0x564ff9866040) at /home/user/conduit/src/tests/relay/t_relay_io_silo.cpp:1870

@JustinPrivitera can ideas on what could trigger this?

@cyrush
Copy link
Member Author

cyrush commented Jan 22, 2026

mrgtree_name                  =    (null)
tv_connectivity               =    0
disjoint_mode                 =    0
topo_dim                      =    not specified
file_ns                       =    "|silo_save_option_nameschemes_n_files_n_domains_yes_spiral.cycle_000000/domain_%06d.silo|n"
block_ns                      =    "|mesh/topo"
block_type                    =    500
repr_block_idx                =    not specified
alt_nodenum_vars              =    NULL
alt_zonenum_vars              =    NULL
empty_cnt                     =    0
empty_list                    =    NULL
meshids                       =    NULL
meshnames                     =    NULL
meshtypes                     =    NULL
dirids                        =    NULL
extents                       =    NULL
zonecounts                    =    NULL
has_external_zones            =    NULL
lgroupings                    =    0
groupings                     =    NULL
groupnames                    =    NULL

@cyrush
Copy link
Member Author

cyrush commented Jan 23, 2026

windows path style issue on some of the tests on windows:

2026-01-23T18:31:06.5590054Z D:\a\conduit\conduit\src\tests\relay\t_relay_io_silo.cpp(2288): error : Expected equality of these values: [D:\a\conduit\conduit\build\RUN_TESTS.vcxproj]
2026-01-23T18:31:06.5590148Z     meshnames[domid]
2026-01-23T18:31:06.5590512Z       Which is: "overlink_save_option_nameschemes_m_domains_n_files_no_spiral\\domfile1.silo:domain3/MESH"
2026-01-23T18:31:06.5590566Z     meshname
2026-01-23T18:31:06.5590816Z       Which is: "overlink_save_option_nameschemes_m_domains_n_files_no_spiral/domfile1.silo:domain3/MESH"

@cyrush cyrush changed the title uberenv update to spack 1.1.0 uberenv update to spack 1.1.1 Jan 28, 2026
@cyrush
Copy link
Member Author

cyrush commented Jan 28, 2026

Newer spack build of parmetis yields different result for the polyhedral test:

Screenshot 2026-01-27 at 4 35 42 PM

left : old (current baseline)
right: new

Input distribution for parmetis poly test for reference
Screenshot 2026-01-27 at 4 38 18 PM

@BradWhitlock -- I think a rebaseline is fine here, what do you think?

@cyrush
Copy link
Member Author

cyrush commented Jan 30, 2026

with new builds of metis and parmetis, we get a different result for the polyhedra test on macos than on linux.

@cyrush
Copy link
Member Author

cyrush commented Feb 2, 2026

parmetis polyhedra results are not the same on all platforms.

(They used to be)

Linux metis and parmetis spack spec:

root@ed773f55ad04:/home/user/conduit/uberenv_libs/spack_env# spack spec

[+]      ^[email protected]~gdb~int64~ipo~no_warning~real64+shared build_system=cmake build_type=Release generator=make patches:=4991da9,93a7903,b1225da platform=linux os=ubuntu22.04 target=zen2 %c,[email protected]
[+]      ^[email protected]~gdb~int64~ipo+shared build_system=cmake build_type=Release generator=make patches:=4f89253,50ed208,704b84f platform=linux os=ubuntu22.04 target=zen2 %c,[email protected]

macOS metis and parmetis spack spec:

[harrison37@zeliak spack_env (task/2026_01_ci_updates)]$ spack spec
[+]      ^[email protected]~gdb~int64~ipo~no_warning~real64+shared build_system=cmake build_type=Release generator=make patches:=4991da9,93a7903 platform=darwin os=sequoia target=m1 %c,[email protected]
[+]      ^[email protected]~gdb~int64~ipo+shared build_system=cmake build_type=Release generator=make patches:=4f89253,50ed208,704b84f platform=darwin os=sequoia target=m1 %c,[email protected]

Spack specs are the same.

The result is subject to a seed, which is the same on all platforms.

@cyrush
Copy link
Member Author

cyrush commented Feb 2, 2026

Old spec for linux build:

[spack version: 0.23.1 (2bfcc69fa870d3c6919be87593f22647981b648a)

2026-02-02T17:46:43.7131194Z  -   zqois2f      ^[email protected]%[email protected]~gdb~int64~ipo~real64+shared build_system=cmake build_type=Release generator=make patches=4991da9,93a7903,b1225da arch=linux-ubuntu20.04-zen2

2026-02-02T17:46:43.7132915Z  -   gyulqlm      ^[email protected]%[email protected]~gdb~int64~ipo+shared build_system=cmake build_type=Release generator=make patches=4f89253,50ed208,704b84f arch=linux-ubuntu20.04-zen2

@cyrush
Copy link
Member Author

cyrush commented Feb 2, 2026

No leads on spack spec differences or patches being the cause. I think we are subject to a difference in rng.

Going to change the baseline to look for a better balance (desired outcome) instead of an exact baseline match.

@cyrush
Copy link
Member Author

cyrush commented Feb 3, 2026

In the docker container test, mpi tests are failing with:

Invalid rank has value 1 but must be nonnegative and less than 1

I think this is an environment issue, i suspect be docker in the azure env is locked down in some way and we can't oversubscribe processes to run mpi.

blt mpi smoke test passes - they are testing a reduce and would pass if somehow the mpi run has 1 rank instead of the expected 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants