Name	Name	Last commit message	Last commit date
parent directory ..
.gitignore	.gitignore
CMakeLists.txt	CMakeLists.txt
Makefile	Makefile
README.md	README.md
main.hip	main.hip
streams_vs2017.sln	streams_vs2017.sln
streams_vs2017.vcxproj	streams_vs2017.vcxproj
streams_vs2017.vcxproj.filters	streams_vs2017.vcxproj.filters
streams_vs2019.sln	streams_vs2019.sln
streams_vs2019.vcxproj	streams_vs2019.vcxproj
streams_vs2019.vcxproj.filters	streams_vs2019.vcxproj.filters
streams_vs2022.sln	streams_vs2022.sln
streams_vs2022.vcxproj	streams_vs2022.vcxproj
streams_vs2022.vcxproj.filters	streams_vs2022.vcxproj.filters

Name

Last commit message

Last commit date

streams_vs2017.vcxproj

streams_vs2017.vcxproj.filters

streams_vs2019.sln

streams_vs2019.vcxproj

streams_vs2019.vcxproj.filters

streams_vs2022.sln

streams_vs2022.vcxproj

streams_vs2022.vcxproj.filters

HIP-Basic Streams Example

Description

A stream encapsulates a queue of tasks that are launched on the GPU device. This example showcases usage of multiple streams, each with their own tasks. These tasks include asynchronous memory copies using hipMemcpyAsync and asynchronous kernel launches using myKernelName<<<...>>>.

Application flow

Host side input and output memory is allocated using hipHostMalloc as pinned memory. It will ensure that the memory copies will be performed asynchronously when using hipMemcpyAsync.
Host input is instantiated.
Device side storage is allocated using hipMalloc.
Two hipStream_t streams are created using hipStreamCreate. The example demonstrates launching two different kernels therefore each stream queues tasks related to each kernel launch.
Data is copied from host to device using hipMemcpyAsync.
Two kernels, matrix_transpose_static_shared and matrix_transpose_dynamic_shared are asynchronously launched using both the streams, repectively.
An asynchronous memory copy task (using hipMemcpyAsync) is queued into the streams that transfers the results from device to host.
The streams are destroyed using hipStreamDestroy.
The host explicitly waits for all tasks to finish using hipDeviceSynchronize.
Free any device side memory using hipFree.
Free host side pinned memory using hipHostFree.

Key APIs and Concepts

A HIP stream allows device tasks to be grouped and launched asynchronously and independently from other tasks, which can be used to hide latencies and increase task completion throughput. When results of a task queued on a particular stream are needed, it can be explicitly synchronized without blocking work queued on other streams. Each HIP stream is tied to a particular device, which enables HIP streams to be used to schedule work across multiple devices simultaneously.

Demonstrated API Calls

HIP runtime

__shared__
__syncthreads
hipStream_t
hipStreamCreate
hipStreamDestroy
hipMalloc
hipHostMalloc
hipMemcpyAsync
hipDeviceSynchronize
hipFree
hipHostFree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

HIP-Basic Streams Example

Description

Application flow

Key APIs and Concepts

Demonstrated API Calls

HIP runtime

FilesExpand file tree

streams

Directory actions

More options

Directory actions

More options

Latest commit

History

streams

Folders and files

parent directory

README.md

HIP-Basic Streams Example

Description

Application flow

Key APIs and Concepts

Demonstrated API Calls

HIP runtime