Name	Name	Last commit message	Last commit date
parent directory ..
.gitignore	.gitignore
CMakeLists.txt	CMakeLists.txt
Makefile	Makefile
README.md	README.md
gpu_arch_vs2017.sln	gpu_arch_vs2017.sln
gpu_arch_vs2017.vcxproj	gpu_arch_vs2017.vcxproj
gpu_arch_vs2017.vcxproj.filters	gpu_arch_vs2017.vcxproj.filters
gpu_arch_vs2019.sln	gpu_arch_vs2019.sln
gpu_arch_vs2019.vcxproj	gpu_arch_vs2019.vcxproj
gpu_arch_vs2019.vcxproj.filters	gpu_arch_vs2019.vcxproj.filters
gpu_arch_vs2022.sln	gpu_arch_vs2022.sln
gpu_arch_vs2022.vcxproj	gpu_arch_vs2022.vcxproj
gpu_arch_vs2022.vcxproj.filters	gpu_arch_vs2022.vcxproj.filters
main.hip	main.hip

Name

Last commit message

Last commit date

gpu_arch_vs2017.vcxproj

gpu_arch_vs2017.vcxproj.filters

gpu_arch_vs2019.sln

gpu_arch_vs2019.vcxproj

gpu_arch_vs2019.vcxproj.filters

gpu_arch_vs2022.sln

gpu_arch_vs2022.vcxproj

gpu_arch_vs2022.vcxproj.filters

main.hip

HIP-Basic GPU Architecture-specific Code Example

Description

This program showcases an implementation of a simple matrix transpose kernel, which uses a different codepath depending on the target architecture.

Application flow

A number of constants are defined to control the problem details and the kernel launch parameters.
Input matrix is set up in host memory.
The necessary amount of device memory is allocated and input is copied to the device.
The GPU transposition kernel is launched with previously defined arguments.
The kernel will have two different codepaths for its data movement, depending on the target architecture.
The transposed matrix is copied back to the host and all device memory is freed.
The elements of the result matrix are compared with the expected result. The result of the comparison is printed to the standard output.

Key APIs and Concepts

This example showcases two different codepaths inside a GPU kernel, depending on the target architecture.

You may want to use architecture-specific inline assembly when compiling for a specific architecture, without losing compatibility with other architectures (see the inline_assembly example).

These architecture-specific compiler definitions only exist within GPU kernels. If you would like to have GPU architecture-specific host-side code, you could query the stream/device information at runtime.

Demonstrated API Calls

HIP runtime

Device symbols

threadIdx, blockIdx, blockDim
__gfx1010__, __gfx1011__, __gfx1012__, __gfx1030__, __gfx1031__, __gfx1100__, __gfx1101__, __gfx1102__

Host symbols

hipMalloc
hipMemcpy
hipGetLastError
hipFree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

HIP-Basic GPU Architecture-specific Code Example

Description

Application flow

Key APIs and Concepts

Demonstrated API Calls

HIP runtime

Device symbols

Host symbols

FilesExpand file tree

gpu_arch

Directory actions

More options

Directory actions

More options

Latest commit

History

gpu_arch

Folders and files

parent directory

README.md

HIP-Basic GPU Architecture-specific Code Example

Description

Application flow

Key APIs and Concepts

Demonstrated API Calls

HIP runtime

Device symbols

Host symbols