16.8x speedup of the Discrete Interaction Approximation — the primary bottleneck in NOAA's WW3 wave model.
| Method | Time (ms) | Speedup | Error |
|---|---|---|---|
| Sequential (1 thread/grid point) | 21.56 | 1.0x | Reference |
| GPU parallel (1 thread/spectral bin/grid point) | 1.29 | 16.8x | 0.000 (bit-identical) |
10,000 grid points x 1,152 spectral bins (36 directions x 32 frequencies). Throughput: 8.95 billion DIA evaluations/sec.
The Discrete Interaction Approximation (DIA) computes nonlinear four-wave interactions in spectral wave models. It is the primary computational bottleneck in WAVEWATCH III (Hasselmann & Hasselmann 1985, WAMDI 1988).
The DIA is embarrassingly parallel across spectral bins — each bin reads energy from neighboring bins via pre-computed index arrays but writes only to its own output. The upstream WW3 code processes bins sequentially. Our GPU kernel assigns one thread per spectral bin per grid point, parallelizing across both dimensions simultaneously.
Yuan et al. (2024) achieved 37x for the WAM6 wave model on GPU, but WW3 has no equivalent GPU implementation. This work fills that gap.
nvcc -O3 -arch=sm_86 ww3_dia_benchmark.cu -o ww3_dia_bench
./ww3_dia_bench- Project 1: Parallel Prefix Scan for RT — 4.73x GPU speedup
- Project 2: Tensor-Compressed k-Tables — 33x compression
- Project 3: This repo — 16.8x DIA GPU speedup
BSD-3-Clause