Using NERSC's Perlmutter HPC #5513

romanlee · 2026-04-17T01:06:54Z

romanlee
Apr 17, 2026
Collaborator

Overview

NERSC's Perlmutter supercomputer is located at Lawrence Berkeley National Lab in Berkeley, California.

Perlmutter is a HPE (Hewlett Packard Enterprise) Cray EX supercomputer consisting of 1792 GPU accelerated nodes with 1 AMD EPYC 7763 processor and 4 NVIDIA A100 GPUs, 448~TB main and 328~TB GPU memory, and 3072 CPU-only nodes with 2 processors, all connected by Cray HPE Slingshot-11.

Perlmutter uses the Slurm Workload manager for batch job submission.

[Note, this post is subject to change. Let's try to keep it up to date, please comment below if something does not work.]

Scope

This discussion can cover anything to do with trying to get results from running Oceananigans on Perlmutter --- including installing Julia, setting up CUDA and MPI, configuring Slurm batch submission scripts, and using other Julia packages in conjunction with Oceananigans.

Links

Perlmutter docs

Getting started on Perlmutter

It's assumed as prerequisite that you have access to Perlmutter.

The first task is to download Julia. See the section of the NERSC docs on Julia for details.

Submit a multi-node GPU job via Slurm

Refer to the NERSC docs for the basics of running jobs on Perlmutter.

In what follows we will describe how to launch a simple, 2 node, 8 GPU simulation which exercises the CUDA-aware MPI implementation in Oceananigans.

First, create a script that will exercise the CUDA-aware MPI implementation, call it hello-cuda-mpi.jl:

using Oceananigans
using Oceananigans.Fields: interpolate!
using MPI
using CUDA
include("sanitize_environ.jl")

# Automatically distributes among available processors
arch = Distributed(GPU())

rank = arch.local_rank
Nranks = MPI.Comm_size(arch.communicator)
println("Hello from process $rank out of $Nranks")

x = y = z = (0, 1)
grid = RectilinearGrid(arch; size=(64, 64, 64), x, y, z, topology=(Periodic, Periodic, Bounded))

@info "The grid on rank $rank:"
@info "$grid"

c = CenterField(grid)
set!(c, (x, y, z) -> x * y^2 * z^3)

@info "c on rank $rank:"
@show c

u = XFaceField(grid)
set!(c, (x, y, z) -> x * y^2 * z^3)
interpolate!(u, c)

@info "u on rank $rank:"
@show u

Next, create a submission script, e.g. named job.sh that contains the following:

#!/bin/bash

# Base script generated by NERSC Batch Script Generator on https://iris.nersc.gov/jobscript.html

#SBATCH -N 2
#SBATCH -C gpu
#SBATCH -G 8
#SBATCH -q debug
#SBATCH -J jobname
#SBATCH -A <your project name>
#SBATCH -t 00:05:00

module purge
module load PrgEnv-gnu
module load gpu
module load julia

# OpenMP settings:
# Probably not necessary, but safer to include
export OMP_NUM_THREADS=1
export OMP_PLACES=threads
export OMP_PROC_BIND=spread

# configure cuda-aware mpi x julia x cray mpich
export JULIA_NUM_THREADS=1
export JULIA_CUDA_MEMORY_POOL=none
export MPICH_GPU_SUPPORT_ENABLED=1

srun -n 8 --ntasks-per-node=4 -c 32 --cpu_bind=cores -G 8 --gpu-bind=none julia --project hello-cuda-mpi.jl

The module and export commands load cray-mpich (the recommended MPI implementation to use on Perlmutter), the cuda-toolkit, and generally ensure your environment is set up properly.

For running your own simulations beyond this toy problem, the Perlmutter jobscript generator is a convenient resource for determining the correct #SBATCH and srun flags. However, the module loads and environmental variable exports should remain the same.

Now, launch the job!

$ sbatch job.sh

But wait! You probably encountered the following message printing on an infinite loop:

┌ Warning: malformed environment entry
│   env = ""
└ @ Base env.jl:223

This is due to a bug which currently exists in the Cray MPICH implementation whereby, on multi-node jobs launched with srun, a malformed environment entry gets inserted after a call to MPI_Init. CUDA.jl is sensitive to this malformed entry, and thus it breaks our hello-cuda-mpi.jl simulation.

Until this is fixed, we have the following workaround.

First, create a file called sanitize_environ.jl that contains:

module SanitizeEnviron

export sanitize_environ!, raw_entries, malformed_indices

function environ_ptr()
    unsafe_load(cglobal(:environ, Ptr{Ptr{UInt8}}))
end

function raw_entries(; limit::Int=4096)
    entries = String[]
    envp = environ_ptr()
    for i in 0:(limit - 1)
        ptr = unsafe_load(envp, i + 1)
        ptr == C_NULL && break
        push!(entries, unsafe_string(Base.Cstring(ptr)))
    end
    return entries
end

function malformed_indices(entries::AbstractVector{<:AbstractString})
    bad = Int[]
    for (i, entry) in pairs(entries)
        occursin('=', entry) || push!(bad, i)
    end
    return bad
end

function sanitize_environ!()
    entries = raw_entries()
    valid_entries = String[]
    removed = String[]
    for entry in entries
        if occursin('=', entry)
            push!(valid_entries, entry)
        else
            push!(removed, entry)
        end
    end

    ccall(:clearenv, Cint, ())
    for entry in valid_entries
        key, value = split(entry, '='; limit=2)
        rc = ccall(:setenv, Cint, (Cstring, Cstring, Cint), key, value, 1)
        rc == 0 || error("setenv failed for $key with rc=$rc")
    end
    return removed
end

end

Then, make the following changes to hello-cuda-mpi.jl. First, after using CUDA add the following line

include("sanitize_environ.jl")

Second, after arch = Distributed(GPU()) (which internally calls MPI_Init) add the following lines:

# fixes `Warning: malformed environment entry`
# Call after MPI.Init()
SanitizeEnviron.sanitize_environ!()

Now, relaunch the job from the command line with sbatch job.sh and you should get the correct output:

[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
Hello from process 0 out of 8
Hello from process 4 out of 8
Hello from process 2 out of 8
Hello from process 1 out of 8
Hello from process 3 out of 8
Hello from process 7 out of 8
Hello from process 6 out of 8
Hello from process 5 out of 8
[ Info: The grid on rank 2:
[ Info: The grid on rank 3:
[ Info: The grid on rank 4:
[ Info: The grid on rank 5:
[ Info: The grid on rank 1:
[ Info: The grid on rank 6:
[ Info: The grid on rank 7:
[ Info: The grid on rank 0:
┌ Info: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.25, 0.375) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)          regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]          regularly spaced with Δz=0.015625
┌ Info: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.5, 0.625) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)         regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]         regularly spaced with Δz=0.015625
┌ Info: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.625, 0.75) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)          regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]          regularly spaced with Δz=0.015625
┌ Info: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.125, 0.25) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)          regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]          regularly spaced with Δz=0.015625
┌ Info: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.375, 0.5) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)         regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]         regularly spaced with Δz=0.015625
┌ Info: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.75, 0.875) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)          regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]          regularly spaced with Δz=0.015625
┌ Info: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.875, 1.0) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)         regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]         regularly spaced with Δz=0.015625
┌ Info: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
│ ├── FullyConnected x ∈ [0.0, 0.125) regularly spaced with Δx=0.015625
│ ├── Periodic y ∈ [0.0, 1.0)         regularly spaced with Δy=0.015625
└ └── Bounded  z ∈ [0.0, 1.0]         regularly spaced with Δz=0.015625
[ Info: c on rank 2:
[ Info: c on rank 3:
[ Info: c on rank 1:
[ Info: c on rank 5:
[ Info: c on rank 4:
[ Info: c on rank 6:
[ Info: c on rank 7:
[ Info: c on rank 0:
c = 8×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.333272
c = 8×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.333272
c = 8×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.333272
c = 8×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.333272
c = 8×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.333272
c = 8×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.333272
c = 8×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.333272
c = 8×64×64 Field{Center, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.954031, min=2.27374e-13, mean=0.333272
[ Info: u on rank 3:
[ Info: u on rank 4:
[ Info: u on rank 1:
[ Info: u on rank 0:
[ Info: u on rank 2:
[ Info: u on rank 7:
[ Info: u on rank 5:
[ Info: u on rank 6:
u = 8×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.310165
u = 8×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.310165
u = 8×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.310165
u = 8×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.310165
u = 8×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.310165
u = 8×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.310165
u = 8×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.310165
u = 8×64×64 Field{Face, Center, Center} on RectilinearGrid on Distributed{CUDAGPU} on 8×1×1
├── grid: 8×64×64 (distributed across 8×1×1 ranks) RectilinearGrid{Float64, FullyConnected, Periodic, Bounded} on Distributed{CUDAGPU} on 8×1×1 with 3×3×3 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: DistributedCommunication, east: DistributedCommunication, south: Periodic, north: Periodic, bottom: ZeroFlux, top: ZeroFlux, immersed: Nothing
└── data: 14×70×70 OffsetArray(::CuArray{Float64, 3, CUDA.DeviceMemory}, -2:11, -2:67, -2:67) with eltype Float64 with indices -2:11×-2:67×-2:67
    └── max=0.946519, min=1.13687e-13, mean=0.310165

Et voilà! You now have a CUDA-aware MPI Oceananigans configuration!! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using NERSC's Perlmutter HPC #5513

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Using NERSC's Perlmutter HPC #5513

Uh oh!

Uh oh!

romanlee Apr 17, 2026 Collaborator

Overview

Scope

Links

Getting started on Perlmutter

Submit a multi-node GPU job via Slurm

Replies: 0 comments

romanlee
Apr 17, 2026
Collaborator