Skip to content

Commit f0221fb

Browse files
[OpenMP] Add option to use different units for blocktime
This change adds the option of using different units for blocktimes specified via the KMP_BLOCKTIME environment variable. The parsing of the environment now recognizes units suffixes: ms and us. If a units suffix is not specified, the default unit is ms. Thus default behavior is still the same, and any previous usage still works the same. Internally, blocktime is now converted to microseconds everywhere, so settings that exceed INT_MAX in microseconds are considered "infinite". kmp_set/get_blocktime are updated to use the units the user specified with KMP_BLOCKTIME, and if not specified, ms are used. Added better range checking and inform messages for the two time units. Large values of blocktime for default (ms) case (beyond INT_MAX/1000) are no longer allowed, but will autocorrect with an INFORM message. The delay for determining ticks per usec was lowered. It is now 1 million ticks which was calculated as ~450us based on 2.2GHz clock which is pretty typical base clock frequency on X86: (1e6 Ticks) / (2.2e9 Ticks/sec) * (1e6 usec/sec) = 454 usec Really short benchmarks can be affected by longer delay. Update KMP_BLOCKTIME docs. Portions of this commit were authored by Johnny Peyton. Differential Revision: https://reviews.llvm.org/D157646
1 parent bbbb93e commit f0221fb

File tree

11 files changed

+160
-122
lines changed

11 files changed

+160
-122
lines changed

openmp/docs/design/Runtimes.rst

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -409,17 +409,17 @@ threads in the team was reduced, but the program will continue execution.
409409
KMP_BLOCKTIME
410410
"""""""""""""
411411

412-
Sets the time, in milliseconds, that a thread should wait, after completing
413-
the execution of a parallel region, before sleeping.
412+
Sets the time that a thread should wait, after completing the
413+
execution of a parallel region, before sleeping.
414414

415-
Use the optional character suffixes: ``s`` (seconds), ``m`` (minutes),
416-
``h`` (hours), or ``d`` (days) to specify the units.
415+
Use the optional suffixes: ``ms`` (milliseconds), or ``us`` (microseconds) to
416+
specify/change the units. Defaults units is milliseconds.
417417

418-
Specify infinite for an unlimited wait time.
418+
Specify ``infinite`` for an unlimited wait time.
419419

420420
| **Default:** 200 milliseconds
421421
| **Related Environment Variable:** ``KMP_LIBRARY``
422-
| **Example:** ``KMP_BLOCKTIME=1s``
422+
| **Example:** ``KMP_BLOCKTIME=1ms``
423423
424424
KMP_CPUINFO_FILE
425425
""""""""""""""""
@@ -1341,22 +1341,22 @@ This is the maximum amount of time the client will wait for a response from the
13411341
LLVM/OpenMP support for C library routines
13421342
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13431343

1344-
Support for calling standard C library routines on GPU targets is provided by
1345-
the `LLVM C Library <https://libc.llvm.org/gpu/>`_. This project provides two
1346-
static libraries, ``libcgpu.a`` and ``libllvmlibc_rpc_server.a``, which are used
1347-
by the OpenMP runtime to provide ``libc`` support. The ``libcgpu.a`` library
1348-
contains the GPU device code, while ``libllvmlibc_rpc_server.a`` provides the
1349-
interface to the RPC interface. More information on the RPC construction can be
1344+
Support for calling standard C library routines on GPU targets is provided by
1345+
the `LLVM C Library <https://libc.llvm.org/gpu/>`_. This project provides two
1346+
static libraries, ``libcgpu.a`` and ``libllvmlibc_rpc_server.a``, which are used
1347+
by the OpenMP runtime to provide ``libc`` support. The ``libcgpu.a`` library
1348+
contains the GPU device code, while ``libllvmlibc_rpc_server.a`` provides the
1349+
interface to the RPC interface. More information on the RPC construction can be
13501350
found in the `associated documentation <https://libc.llvm.org/gpu/rpc.html>`_.
13511351

1352-
To provide host services, we run an RPC server inside of the runtime. This
1353-
allows the host to respond to requests made from the GPU asynchronously. For
1354-
``libc`` calls that require an RPC server, such as printing, an external handle
1355-
to the RPC client running on the GPU will be present in the GPU executable. If
1356-
we find this symbol, we will initialize a client and server and run it in the
1352+
To provide host services, we run an RPC server inside of the runtime. This
1353+
allows the host to respond to requests made from the GPU asynchronously. For
1354+
``libc`` calls that require an RPC server, such as printing, an external handle
1355+
to the RPC client running on the GPU will be present in the GPU executable. If
1356+
we find this symbol, we will initialize a client and server and run it in the
13571357
background while the kernel is executing.
13581358

1359-
For example, consider the following simple OpenMP offloading code. Here we will
1359+
For example, consider the following simple OpenMP offloading code. Here we will
13601360
simply print a string to the user from the GPU.
13611361

13621362
.. code-block:: c++
@@ -1368,11 +1368,11 @@ simply print a string to the user from the GPU.
13681368
{ fputs("Hello World!\n", stderr); }
13691369
}
13701370

1371-
We can compile this using the ``libcgpu.a`` library to resolve the symbols.
1372-
Because this function requires RPC support, this will also pull in an externally
1373-
visible symbol called ``__llvm_libc_rpc_client`` into the device image. When
1374-
loading the device image, the runtime will check for this symbol and initialize
1375-
an RPC interface if it is found. The following example shows the RPC server
1371+
We can compile this using the ``libcgpu.a`` library to resolve the symbols.
1372+
Because this function requires RPC support, this will also pull in an externally
1373+
visible symbol called ``__llvm_libc_rpc_client`` into the device image. When
1374+
loading the device image, the runtime will check for this symbol and initialize
1375+
an RPC interface if it is found. The following example shows the RPC server
13761376
being used.
13771377

13781378
.. code-block:: console

openmp/runtime/src/kmp.h

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,7 @@ class kmp_stats_list;
180180

181181
#define KMP_NSEC_PER_SEC 1000000000L
182182
#define KMP_USEC_PER_SEC 1000000L
183+
#define KMP_NSEC_PER_USEC 1000L
183184

184185
/*!
185186
@ingroup BASIC_TYPES
@@ -1190,13 +1191,13 @@ extern void __kmp_init_target_task();
11901191
#define KMP_MAX_STKPADDING (2 * 1024 * 1024)
11911192

11921193
#define KMP_BLOCKTIME_MULTIPLIER \
1193-
(1000) /* number of blocktime units per second */
1194+
(1000000) /* number of blocktime units per second */
11941195
#define KMP_MIN_BLOCKTIME (0)
11951196
#define KMP_MAX_BLOCKTIME \
11961197
(INT_MAX) /* Must be this for "infinite" setting the work */
11971198

1198-
/* __kmp_blocktime is in milliseconds */
1199-
#define KMP_DEFAULT_BLOCKTIME (__kmp_is_hybrid_cpu() ? (0) : (200))
1199+
/* __kmp_blocktime is in microseconds */
1200+
#define KMP_DEFAULT_BLOCKTIME (__kmp_is_hybrid_cpu() ? (0) : (200000))
12001201

12011202
#if KMP_USE_MONITOR
12021203
#define KMP_DEFAULT_MONITOR_STKSIZE ((size_t)(64 * 1024))
@@ -1223,22 +1224,21 @@ extern void __kmp_init_target_task();
12231224
#if KMP_OS_UNIX && (KMP_ARCH_X86 || KMP_ARCH_X86_64)
12241225
// HW TSC is used to reduce overhead (clock tick instead of nanosecond).
12251226
extern kmp_uint64 __kmp_ticks_per_msec;
1227+
extern kmp_uint64 __kmp_ticks_per_usec;
12261228
#if KMP_COMPILER_ICC || KMP_COMPILER_ICX
12271229
#define KMP_NOW() ((kmp_uint64)_rdtsc())
12281230
#else
12291231
#define KMP_NOW() __kmp_hardware_timestamp()
12301232
#endif
1231-
#define KMP_NOW_MSEC() (KMP_NOW() / __kmp_ticks_per_msec)
12321233
#define KMP_BLOCKTIME_INTERVAL(team, tid) \
1233-
(KMP_BLOCKTIME(team, tid) * __kmp_ticks_per_msec)
1234+
((kmp_uint64)KMP_BLOCKTIME(team, tid) * __kmp_ticks_per_usec)
12341235
#define KMP_BLOCKING(goal, count) ((goal) > KMP_NOW())
12351236
#else
12361237
// System time is retrieved sporadically while blocking.
12371238
extern kmp_uint64 __kmp_now_nsec();
12381239
#define KMP_NOW() __kmp_now_nsec()
1239-
#define KMP_NOW_MSEC() (KMP_NOW() / KMP_USEC_PER_SEC)
12401240
#define KMP_BLOCKTIME_INTERVAL(team, tid) \
1241-
(KMP_BLOCKTIME(team, tid) * KMP_USEC_PER_SEC)
1241+
((kmp_uint64)KMP_BLOCKTIME(team, tid) * (kmp_uint64)KMP_NSEC_PER_USEC)
12421242
#define KMP_BLOCKING(goal, count) ((count) % 1000 != 0 || (goal) > KMP_NOW())
12431243
#endif
12441244
#endif // KMP_USE_MONITOR
@@ -3351,9 +3351,22 @@ extern int __kmp_tp_capacity; /* capacity of __kmp_threads if threadprivate is
33513351
used (fixed) */
33523352
extern int __kmp_tp_cached; /* whether threadprivate cache has been created
33533353
(__kmpc_threadprivate_cached()) */
3354-
extern int __kmp_dflt_blocktime; /* number of milliseconds to wait before
3354+
extern int __kmp_dflt_blocktime; /* number of microseconds to wait before
33553355
blocking (env setting) */
3356+
extern char __kmp_blocktime_units; /* 'm' or 'u' to note units specified */
33563357
extern bool __kmp_wpolicy_passive; /* explicitly set passive wait policy */
3358+
3359+
// Convert raw blocktime from ms to us if needed.
3360+
static inline void __kmp_aux_convert_blocktime(int *bt) {
3361+
if (__kmp_blocktime_units == 'm') {
3362+
if (*bt > INT_MAX / 1000) {
3363+
*bt = INT_MAX / 1000;
3364+
KMP_INFORM(MaxValueUsing, "kmp_set_blocktime(ms)", bt);
3365+
}
3366+
*bt = *bt * 1000;
3367+
}
3368+
}
3369+
33573370
#if KMP_USE_MONITOR
33583371
extern int
33593372
__kmp_monitor_wakeups; /* number of times monitor wakes up per second */

openmp/runtime/src/kmp_csupport.cpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2065,14 +2065,15 @@ void kmpc_set_stacksize_s(size_t arg) {
20652065
}
20662066

20672067
void kmpc_set_blocktime(int arg) {
2068-
int gtid, tid;
2068+
int gtid, tid, bt = arg;
20692069
kmp_info_t *thread;
20702070

20712071
gtid = __kmp_entry_gtid();
20722072
tid = __kmp_tid_from_gtid(gtid);
20732073
thread = __kmp_thread_from_gtid(gtid);
20742074

2075-
__kmp_aux_set_blocktime(arg, thread, tid);
2075+
__kmp_aux_convert_blocktime(&bt);
2076+
__kmp_aux_set_blocktime(bt, thread, tid);
20762077
}
20772078

20782079
void kmpc_set_library(int arg) {

openmp/runtime/src/kmp_ftn_entry.h

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -112,17 +112,19 @@ void FTN_STDCALL FTN_SET_BLOCKTIME(int KMP_DEREF arg) {
112112
#ifdef KMP_STUB
113113
__kmps_set_blocktime(KMP_DEREF arg);
114114
#else
115-
int gtid, tid;
115+
int gtid, tid, bt = (KMP_DEREF arg);
116116
kmp_info_t *thread;
117117

118118
gtid = __kmp_entry_gtid();
119119
tid = __kmp_tid_from_gtid(gtid);
120120
thread = __kmp_thread_from_gtid(gtid);
121121

122-
__kmp_aux_set_blocktime(KMP_DEREF arg, thread, tid);
122+
__kmp_aux_convert_blocktime(&bt);
123+
__kmp_aux_set_blocktime(bt, thread, tid);
123124
#endif
124125
}
125126

127+
// Gets blocktime in units used for KMP_BLOCKTIME, ms otherwise
126128
int FTN_STDCALL FTN_GET_BLOCKTIME(void) {
127129
#ifdef KMP_STUB
128130
return __kmps_get_blocktime();
@@ -136,21 +138,24 @@ int FTN_STDCALL FTN_GET_BLOCKTIME(void) {
136138

137139
/* These must match the settings used in __kmp_wait_sleep() */
138140
if (__kmp_dflt_blocktime == KMP_MAX_BLOCKTIME) {
139-
KF_TRACE(10, ("kmp_get_blocktime: T#%d(%d:%d), blocktime=%d\n", gtid,
140-
team->t.t_id, tid, KMP_MAX_BLOCKTIME));
141+
KF_TRACE(10, ("kmp_get_blocktime: T#%d(%d:%d), blocktime=%d%cs\n", gtid,
142+
team->t.t_id, tid, KMP_MAX_BLOCKTIME, __kmp_blocktime_units));
141143
return KMP_MAX_BLOCKTIME;
142144
}
143145
#ifdef KMP_ADJUST_BLOCKTIME
144146
else if (__kmp_zero_bt && !get__bt_set(team, tid)) {
145-
KF_TRACE(10, ("kmp_get_blocktime: T#%d(%d:%d), blocktime=%d\n", gtid,
146-
team->t.t_id, tid, 0));
147+
KF_TRACE(10, ("kmp_get_blocktime: T#%d(%d:%d), blocktime=%d%cs\n", gtid,
148+
team->t.t_id, tid, 0, __kmp_blocktime_units));
147149
return 0;
148150
}
149151
#endif /* KMP_ADJUST_BLOCKTIME */
150152
else {
151-
KF_TRACE(10, ("kmp_get_blocktime: T#%d(%d:%d), blocktime=%d\n", gtid,
152-
team->t.t_id, tid, get__blocktime(team, tid)));
153-
return get__blocktime(team, tid);
153+
int bt = get__blocktime(team, tid);
154+
if (__kmp_blocktime_units == 'm')
155+
bt = bt / 1000;
156+
KF_TRACE(10, ("kmp_get_blocktime: T#%d(%d:%d), blocktime=%d%cs\n", gtid,
157+
team->t.t_id, tid, bt, __kmp_blocktime_units));
158+
return bt;
154159
}
155160
#endif
156161
}

openmp/runtime/src/kmp_global.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,8 @@ int __kmp_hier_max_units[kmp_hier_layer_e::LAYER_LAST + 1];
154154
int __kmp_hier_threads_per[kmp_hier_layer_e::LAYER_LAST + 1];
155155
kmp_hier_sched_env_t __kmp_hier_scheds = {0, 0, NULL, NULL, NULL};
156156
#endif
157-
int __kmp_dflt_blocktime = KMP_DEFAULT_BLOCKTIME;
157+
int __kmp_dflt_blocktime = KMP_DEFAULT_BLOCKTIME; // in microseconds
158+
char __kmp_blocktime_units = 'm'; // Units specified in KMP_BLOCKTIME
158159
bool __kmp_wpolicy_passive = false;
159160
#if KMP_USE_MONITOR
160161
int __kmp_monitor_wakeups = KMP_MIN_MONITOR_WAKEUPS;

openmp/runtime/src/kmp_runtime.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8729,9 +8729,8 @@ void __kmp_aux_display_affinity(int gtid, const char *format) {
87298729
}
87308730

87318731
/* ------------------------------------------------------------------------ */
8732-
87338732
void __kmp_aux_set_blocktime(int arg, kmp_info_t *thread, int tid) {
8734-
int blocktime = arg; /* argument is in milliseconds */
8733+
int blocktime = arg; /* argument is in microseconds */
87358734
#if KMP_USE_MONITOR
87368735
int bt_intervals;
87378736
#endif

0 commit comments

Comments
 (0)