This example illustrates the use of the hipBLAS Level 3 Strided Batched General Matrix Multiplication. The hipBLAS GEMM STRIDED BATCHED performs a matrix--matrix operation for a batch of matrices as:
for each
-
$f(X) = X$ or -
$f(X) = X^T$ (transpose$X$ :$X_{ij}^T = X_{ji}$ ) or -
$f(X) = X^H$ (Hermitian$X$ :$X_{ij}^H = \bar X_{ji} $ ).
- Read in command-line parameters.
- Set
$f$ operation, set sizes of matrices and get batch count. - Allocate and initialize the host matrices. Set up
$B$ matrix as an identity matrix. - Initialize gold standard matrix.
- Compute CPU reference result with strided batched subvectors.
- Allocate device memory.
- Copy data from host to device.
- Create a hipBLAS handle.
- Invoke the hipBLAS GEMM STRIDED BATCHED function.
- Copy the result from device to host.
- Destroy the hipBLAS handle, release device memory.
- Validate the output by comparing it to the CPU reference result.
The application provides the following optional command line arguments:
-
-aor--alpha. The scalar value$\alpha$ used in the GEMM operation. Its default value is 1. -
-bor--beta. The scalar value$\beta$ used in the GEMM operation. Its default value is 1. -
-cor--count. Batch count. Its default value is 3. -
-mor--m. The number of rows of matrices$f(A)$ and$C$ , which must be greater than 0. Its default value is 5. -
-nor--n. The number of columns of matrices$f(B)$ and$C$ , which must be greater than 0. Its default value is 5. -
-kor--k. The number of columns of matrix$f(A)$ and rows of matrix$f(B)$ , which must be greater than 0. Its default value is 5.
-
The performance of a numerical multi-linear algebra code can be heavily increased by using tensor contractions [ Y. Shi et al., HiPC, pp 193, 2016. ], thereby most of the hipBLAS functions have a
_batchedand a_strided_batched[ C. Jhurani and P. Mullowney, JPDP Vol 75, pp 133, 2015. ] extensions.
We can apply the same multiplication operator for several matrices if we combine them into batched matrices. Batched matrix multiplication has a performance improvement for a large number of small matrices. For a constant stride between matrices, further acceleration is available by strided batched GEMM. -
hipBLAS is initialized by calling
hipblasCreate(hipblasHandle*)and it is terminated by callinghipblasDestroy(hipblasHandle). -
The pointer mode controls whether scalar parameters must be allocated on the host (
HIPBLAS_POINTER_MODE_HOST) or on the device (HIPBLAS_POINTER_MODE_DEVICE). It is controlled byhipblasSetPointerMode. -
The
$f$ operator -- defined in Description section -- can be-
HIPBLAS_OP_N: identity operator ($f(X) = X$ ), -
HIPBLAS_OP_T: transpose operator ($f(X) = X^T$ ) or -
HIPBLAS_OP_C: Hermitian (conjugate transpose) operator ($f(X) = X^H$ ).
-
-
hipblasStridestrides between matrices or vectors in strided_batched functions. -
hipblas[HSDCZ]gemmStridedBatchedDepending on the character matched in
[HSDCZ], the norm can be obtained with different precisions:-
H(half-precision:hipblasHalf) -
S(single-precision:float) -
D(double-precision:double) -
C(single-precision complex:hipblasComplex) -
Z(double-precision complex:hipblasDoubleComplex).
Input parameters for
hipblasSgemmStridedBatched:hipblasHandle_t handle-
hipblasOperation_t trans_a: transformation operator on each$A_i$ matrix -
hipblasOperation_t trans_b: transformation operator on each$B_i$ matrix -
int m: number of rows in each$f(A_i)$ and$C$ matrices -
int n: number of columns in each$f(B_i)$ and$C$ matrices -
int k: number of columns in each$f(A_i)$ matrix and number of rows in each$f(B_i)$ matrix -
const float *alpha: scalar multiplier of each$C_i$ matrix addition -
const float *A: pointer to the each$A_i$ matrix -
int lda: leading dimension of each$A_i$ matrix -
long long stride_a: stride size for each$A_i$ matrix -
const float *B: pointer to each$B_i$ matrix -
int ldb: leading dimension of each$B_i$ matrix -
const float *beta: scalar multiplier of the$B \cdot C$ matrix product -
long long stride_b: stride size for each$B_i$ matrix -
float *C: pointer to each$C_i$ matrix -
int ldc: leading dimension of each$C_i$ matrix -
long long stride_c: stride size for each$C_i$ matrix -
int batch_count: number of matrices
Return value:
hipblasStatus_t -
hipblasCreatehipblasDestroyhipblasHandle_thipblasSgemmStridedBatchedhipblasOperation_thipblasStridehipblasSetPointerModeHIPBLAS_OP_NHIPBLAS_POINTER_MODE_HOST
hipFreehipMallochipMemcpyhipMemcpyDeviceToHosthipMemcpyHostToDevice