Skip to content

Commit d824043

Browse files
committed
All new 6.0.1 modifications to existing files.
1 parent 415024c commit d824043

File tree

160 files changed

+1661
-1359
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

160 files changed

+1661
-1359
lines changed

Chap_SIMD.tex

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,21 +11,21 @@
1111
Loops without loop-carried backward dependences (or with dependences preserved using
1212
\kcode{ordered simd}) are candidates for vectorization by the compiler for
1313
execution with SIMD units. In addition, with state-of-the-art vectorization
14-
technology and \kcode{declare simd} directive extensions for function vectorization
15-
in the OpenMP 4.5 specification, loops with function calls can be vectorized as well.
14+
technology and \kcode{declare_simd} directive extensions for function vectorization
15+
in the OpenMP Specification, loops with function calls can be vectorized as well.
1616
The basic idea is that a scalar function call in a loop can be replaced by a vector version
1717
of the function, and the loop can be vectorized simultaneously by combining a loop
1818
vectorization (\kcode{simd} directive on the loop) and a function
19-
vectorization (\kcode{declare simd} directive on the function).
19+
vectorization (\kcode{declare_simd} directive on the function).
2020

2121
A \kcode{simd} construct states that SIMD operations be performed on the
22-
data within the loop. A number of clauses are available to provide
22+
data within the loop. A number of clauses are available to provide
2323
data-sharing attributes (\kcode{private}, \kcode{linear}, \kcode{reduction} and
2424
\kcode{lastprivate}). Other clauses provide vector length preference/restrictions
2525
(\kcode{simdlen} / \kcode{safelen}), loop fusion (\kcode{collapse}), and data
2626
alignment (\kcode{aligned}).
2727

28-
The \kcode{declare simd} directive designates
28+
The \kcode{declare_simd} directive designates
2929
that a vector version of the function should also be constructed for
3030
execution within loops that contain the function and have a \kcode{simd}
3131
directive. Clauses provide argument specifications (\kcode{linear},

Chap_affinity.tex

Lines changed: 53 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -1,74 +1,59 @@
11
\cchapter{OpenMP Affinity}{affinity}
22
\label{chap:openmp_affinity}
33

4-
OpenMP Affinity consists of a \kcode{proc_bind} policy (thread affinity policy) and a specification of
5-
places (``location units'' or \plc{processors} that may be cores, hardware
6-
threads, sockets, etc.).
7-
OpenMP Affinity enables users to bind computations on specific places.
8-
The placement will hold for the duration of the parallel region.
9-
However, the runtime is free to migrate the OpenMP threads
10-
to different cores (hardware threads, sockets, etc.) prescribed within a given place,
11-
if two or more cores (hardware threads, sockets, etc.) have been assigned to a given place.
12-
13-
Often the binding can be managed without resorting to explicitly setting places.
14-
Without the specification of places in the \kcode{OMP_PLACES} variable,
15-
the OpenMP runtime will distribute and bind threads using the entire range of processors for
16-
the OpenMP program, according to the \kcode{OMP_PROC_BIND} environment variable
17-
or the \kcode{proc_bind} clause. When places are specified, the OMP runtime
18-
binds threads to the places according to a default distribution policy, or
19-
those specified in the \kcode{OMP_PROC_BIND} environment variable or the
20-
\kcode{proc_bind} clause.
21-
22-
In the OpenMP Specifications document a processor refers to an execution unit that
23-
is enabled for an OpenMP thread to use. A processor is a core when there is
24-
no SMT (Simultaneous Multi-Threading) support or SMT is disabled. When
25-
SMT is enabled, a processor is a hardware thread (HW-thread). (This is the
26-
usual case; but actually, the execution unit is implementation defined.) Processor
27-
numbers are numbered sequentially from 0 to the number of cores less one (without SMT), or
28-
0 to the number HW-threads less one (with SMT). OpenMP places use the processor number to designate
29-
binding locations (unless an ``abstract name'' is used.)
30-
31-
32-
The processors available to a process may be a subset of the system's
33-
processors. This restriction may be the result of a
34-
wrapper process controlling the execution (such as \plc{numactl} on Linux systems),
35-
compiler options, library-specific environment variables, or default
36-
kernel settings. For instance, the execution of multiple MPI processes,
37-
launched on a single compute node, will each have a subset of processors as
38-
determined by the MPI launcher or set by MPI affinity environment
39-
variables for the MPI library. %Forked threads within an MPI process
40-
%(for a hybrid execution of MPI and OpenMP code) inherit the valid
41-
%processor set for execution from the parent process (the initial task region)
42-
%when a parallel region forks threads. The binding policy set in
43-
%\code{OMP\_PROC\_BIND} or the \code{proc\_bind} clause will be applied to
44-
%the subset of processors available to \plc{the particular} MPI process.
45-
46-
%Also, setting an explicit list of processor numbers in the \code{OMP\_PLACES}
47-
%variable before an MPI launch (which involves more than one MPI process) will
48-
%result in unspecified behavior (and doesn't make sense) because the set of
49-
%processors in the places list must not contain processors outside the subset
50-
%of processors for an MPI process. A separate \code{OMP\_PLACES} variable must
51-
%be set for each MPI process, and is usually accomplished by launching a script
52-
%which sets \code{OMP\_PLACES} specifically for the MPI process.
53-
54-
Threads of a team are positioned onto places in a compact manner, a
55-
scattered distribution, or onto the primary thread's place, by setting the
56-
\kcode{OMP_PROC_BIND} environment variable or the \kcode{proc_bind} clause to
57-
\kcode{close}, \kcode{spread}, or \kcode{primary} (\kcode{master} has been deprecated), respectively. When
58-
\kcode{OMP_PROC_BIND} is set to FALSE no binding is enforced; and
59-
when the value is TRUE, the binding is implementation defined to
60-
a set of places in the \kcode{OMP_PLACES} variable or to places
61-
defined by the implementation if the \kcode{OMP_PLACES} variable
62-
is not set.
63-
64-
The \kcode{OMP_PLACES} variable can also be set to an abstract name
65-
(\kcode{threads}, \kcode{cores}, \kcode{sockets}) to specify that a place is
66-
either a single hardware thread, a core, or a socket, respectively.
67-
This description of the \kcode{OMP_PLACES} is most useful when the
68-
number of threads is equal to the number of hardware thread, cores
69-
or sockets. It can also be used with a \kcode{close} or \kcode{spread}
70-
distribution policy when the equality doesn't hold.
71-
4+
OpenMP defines \emph{thread affinity} with respect to \emph{places}, where a
5+
place is an abstraction that represents a set of processors (e.g., one or more
6+
processor IDs, a hardware thread, a core, a socket, etc.). Thread affinity
7+
control enables users to assign threads that perform computation in a parallel
8+
region to specific places, while allowing the runtime implementation to freely
9+
migrate threads to different execution units within a given place. A thread
10+
that is assigned to a place for a given parallel region remains bound to that
11+
place for the duration of that region.
12+
13+
The places available for thread affinity control (referred to as a \emph{place
14+
partition}) can be set via the \kcode{OMP_PLACES} environment variable. The
15+
binding of threads to places can be managed explicitly or handled implicitly.
16+
Without the \kcode{OMP_PLACES} variable being set, the initial place partition
17+
is implementation defined. The method by which threads are assigned to places
18+
for a given parallel region is determined by the specified thread affinity
19+
policy. This policy can be set via the \kcode{OMP_PROC_BIND} environment
20+
variable or can be explicitly set for a particular \kcode{parallel} construct
21+
with the \kcode{proc_bind} clause.
22+
23+
The OpenMP specification document defines a \emph{processor} as a hardware
24+
execution unit on which one or more OpenMP threads may execute. The actual
25+
hardware mechanism that a given processor ID represents depends on the
26+
implementation and architecture. For example, a processor could correspond to a
27+
core on the device that does not have simultaneous multi-threading (SMT)
28+
support or for which SMT is disabled. While for an SMT-enabled device, a
29+
processor could correspond to a hardware thread. Processor IDs are the
30+
resulting sequential numbering of processors, starting from 0. The initial
31+
place partition can be defined explicitly with processor IDs or using an
32+
\emph{abstract name}. For example, \pout{OMP_PLACES="\{0,1\},\{2,3\}"}
33+
defines two places in the initial place partition, the first place consisting
34+
of processors 0 and 1 from the device and the second place consisting of
35+
processors 2 and 3 from the device. Alternatively, \pout{OMP_PLACES="cores"}
36+
defines there to be one place per core on the host device.
37+
38+
The processors that are available to an OpenMP program process may be a subset
39+
of the processors on the system. This restriction may be the result of a
40+
wrapper process controlling the execution (such as \ucode{numactl} on Linux
41+
systems), compiler options, library-specific environment variables, or default
42+
kernel settings. For instance, the execution of multiple MPI processes, launched
43+
on a single compute node, will each have a subset of processors as determined
44+
by the MPI launcher or set by MPI affinity environment variables for the MPI
45+
library.
46+
47+
The threads that are under affinity control for a given parallel region include
48+
the threads assigned to its team and additionally any free-agent threads (see
49+
Section~\ref{sec:free_agent}) that execute tasks bound to the region. Affinity
50+
control for threads can be disabled (i.e., allowing threads to migrate freely
51+
across processors) by setting \kcode{OMP_PROC_BIND} to \vcode{false}. If instead
52+
\kcode{OMP_PROC_BIND} is \vcode{true}, then threads will bind to places but the
53+
places to which they bind are implementation defined. Finally, three affinity
54+
policies that are more prescriptive are available via the environment variable
55+
or the \kcode{proc_bind} clause: \kcode{spread}, \kcode{close}, and
56+
\kcode{primary}. These are detailed in the following section.
7257

7358
% We need an example of using sockets, cores and threads:
7459

Chap_data_environment.tex

Lines changed: 68 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,42 @@
11
\cchapter{Data Environment}{data_environment}
22
\label{chap:data_environment}
3-
The OpenMP \plc{data environment} contains data attributes of variables and
4-
objects. Many constructs (such as \kcode{parallel}, \kcode{simd}, \kcode{task})
5-
accept clauses to control \plc{data-sharing} attributes
6-
of referenced variables in the construct, where \plc{data-sharing} applies to
7-
whether the attribute of the variable is \plc{shared},
8-
is \plc{private} storage, or has special operational characteristics
9-
(as found in the \kcode{firstprivate}, \kcode{lastprivate}, \kcode{linear}, or \kcode{reduction} clause).
3+
An OpenMP \emph{data environment} is defined by a set of variables or objects
4+
and their \emph{data-environment attributes}. Data-environment attributes can
5+
be divided into \emph{data-sharing attributes} and \emph{data-mapping
6+
attributes}.
107

11-
The data environment for a device (distinguished as a \plc{device data environment})
12-
is controlled on the host by \plc{data-mapping} attributes, which determine the
13-
relationship of the data on the host, the \plc{original} data, and the data on the
14-
device, the \plc{corresponding} data.
8+
Many constructs (such as \kcode{parallel}, \kcode{simd}, \kcode{task})
9+
accept clauses to control data-sharing attributes of referenced variables in
10+
the construct, where data-sharing applies to whether the attribute of the
11+
variable is \emph{shared} or \emph{private}, in addition to other special
12+
operational characteristics of private (as indicated by the
13+
\kcode{firstprivate}, \kcode{lastprivate}, \kcode{linear}, or \kcode{reduction}
14+
clause).
15+
16+
Variables and objects in the data environment for a target device
17+
(distinguished as a device data environment) have \emph{data-mapping
18+
attributes} that are controlled by data-mapping constructs (such as
19+
\kcode{target} or \kcode{target_data}), which determine the relationship of the
20+
data on the host (the \emph{original} data) and the data on the device (the
21+
\emph{corresponding} data).
1522

1623
\bigskip
1724
DATA-SHARING ATTRIBUTES
1825

19-
Data-sharing attributes of variables can be classified as being \plc{predetermined},
20-
\plc{explicitly determined} or \plc{implicitly determined}.
26+
Data-sharing attributes of variables can be classified as being \emph{predetermined},
27+
\emph{explicitly determined} or \emph{implicitly determined}.
2128

2229
Certain variables and objects have predetermined attributes.
2330
A commonly found case is the loop iteration variable in associated loops
24-
of a \kcode{for} or \kcode{do} construct. It has a private data-sharing attribute.
25-
Variables with predetermined data-sharing attributes cannot be listed in a data-sharing clause; but there are some
26-
exceptions (mainly concerning loop iteration variables).
31+
of a \kcode{for} or \kcode{do} construct. It has a private data-sharing
32+
attribute. Certain declarative directives can also be used to define variables
33+
as having special predetermined data-sharing attributes including \emph{threadprivate},
34+
\emph{groupprivate}, and \emph{device-local}. Variables with predetermined
35+
data-sharing attributes cannot usually be listed in a data-sharing clause, but there
36+
are some exceptions (mainly concerning loop iteration variables).
2737

2838
Variables with explicitly determined data-sharing attributes are those that are
29-
referenced in a given construct and are listed in a data-sharing attribute
39+
referenced in a given construct and are listed in a data-sharing
3040
clause on the construct. Some of the common data-sharing clauses are:
3141
\kcode{shared}, \kcode{private}, \kcode{firstprivate}, \kcode{lastprivate},
3242
\kcode{linear}, and \kcode{reduction}. % Are these all of them?
@@ -38,44 +48,57 @@
3848
For a complete list of variables and objects with predetermined and
3949
implicitly determined attributes, please refer to the
4050
\docref{Data-sharing Attribute Rules for Variables Referenced in a Construct}
41-
subsection of the OpenMP Specifications document.
51+
subsection of the OpenMP Specification document.
4252

4353
\bigskip
4454
DATA-MAPPING ATTRIBUTES
4555

46-
The \kcode{map} clause on a device construct explicitly specifies how the list items in
47-
the clause are mapped from the encountering task's data environment (on the host)
48-
to the corresponding item in the device data environment (on the device).
49-
The common \plc{list items} are arrays, array sections, scalars, pointers, and
50-
structure elements (members).
51-
52-
Procedures and global variables have predetermined data mapping if they appear
53-
within the list or block of a \kcode{declare target} directive. Also, a C/C++ pointer
54-
is mapped as a zero-length array section, as is a C++ variable that is a reference to a pointer.
55-
% Waiting for response from Eric on this.
56+
A data-mapping attribute determines the manner in which a variable or object is
57+
mapped from a data environment of a task (typically on the host device) to a
58+
device data environment on a different device. The specification of list items
59+
in a \kcode{map} clause is the main mechanism for controlling the data-mapping
60+
attributes of data in a device data environment. These list items may include
61+
variables, including array and structure elements, array sections, as well as
62+
more general lvalue expressions in C/C++ (such as a dereferenced expression of
63+
pointer type).
5664

57-
Without explicit mapping, non-scalar and non-pointer variables within the scope of the \kcode{target}
58-
construct are implicitly mapped with a \plc{map-type} of \kcode{tofrom}.
59-
Without explicit mapping, scalar variables within the scope of the \kcode{target}
60-
construct are not mapped, but have an implicit firstprivate data-sharing
61-
attribute. (That is, the value of the original variable is given to a private
62-
variable of the same name on the device.) This behavior can be changed with
63-
the \kcode{defaultmap} clause.
65+
If a \kcode{map} clause is not explicitly specified for a variable that is
66+
referenced in a \kcode{target} construct, that variable may still have an
67+
\emph{implicit} data-mapping attribute (as if it had appeared in a \kcode{map}
68+
clause). For example, the use of a declare target directive or
69+
\kcode{defaultmap} clause can result in a variable having an implicit
70+
data-mapping attribute. Additionally, list items that appear in certain
71+
data-sharing clauses (e.g., \kcode{reduction}) on a compound target construct
72+
can imply a data-mapping attribute. Also, non-scalar variables referenced
73+
inside a \kcode{target} construct that do not otherwise have a predetermined or
74+
explicit data-sharing or data-mapping attribute will typically be implicitly
75+
mapped by default, in contrast to scalar variables which are typically given
76+
an implicit firstprivate attribute (these default implicit attributes can be
77+
changed with the use of the \kcode{defaultmap} clause). For a complete set of
78+
rules for implicit data-mapping attributes, refer to the
79+
\docref{Implicit Data-Mapping Attribute Rules}
80+
subsection of the OpenMP Specification document.
6481

65-
The \kcode{map} clause can appear on \kcode{target}, \kcode{target data} and
66-
\kcode{target enter/exit data} constructs. The operations of creation and
67-
removal of device storage as well as assignment of the original list item
68-
values to the corresponding list items may be complicated when the list
69-
item appears on multiple constructs or when the host and device storage
70-
is shared. In these cases the item's reference count, the number of times
71-
it has been referenced (increment by 1 on entry and decrement by 1 on exit) in nested (structured)
72-
map regions and/or accumulative (unstructured) mappings, determines the operation.
73-
Details of the \kcode{map} clause and reference count operation are specified
74-
in the \docref{\kcode{map} Clause} subsection of the OpenMP Specifications document.
82+
The \kcode{map} clause can appear on data-mapping constructs (specifically,
83+
\kcode{target}, \kcode{target_data}, \kcode{target_enter_data} and
84+
\kcode{target_exit_data}). The operations of creation and removal of corresponding
85+
storage as well as assignment of the original list item values to the
86+
corresponding list items may be complicated when the list item appears on
87+
multiple constructs that are executed concurrently. To accomodate this, a
88+
reference count is maintained to determine which of those operations are
89+
needed. This can help ensure that corresponding storage is not removed on
90+
completion of one construct while another construct that has mapped the same
91+
data still requires it, as well as elide data transfers between devices in
92+
cases where corresponding storage is not being created or removed (though this
93+
can be overridden with use of modifiers such as \kcode{delete} or
94+
\kcode{always}). Details of the \kcode{map} clause and reference count
95+
operations are specified in the \docref{\kcode{map} Clause} subsection of the
96+
OpenMP Specification document.
7597

7698

7799
%===== Examples Sections =====
78100
\input{data_environment/threadprivate}
101+
\input{data_environment/groupprivate}
79102
\input{data_environment/default_none}
80103
\input{data_environment/private}
81104
\input{data_environment/fort_loopvar}

0 commit comments

Comments
 (0)