Skip to content

[OpenMP] Fix various alignment issues #142376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions openmp/runtime/src/kmp_alloc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,10 @@ static void bectl(kmp_info_t *th, bget_compact_t compact,
/* Buffer allocation size quantum: all buffers allocated are a
multiple of this size. This MUST be a power of two. */

/* On IA-32 architecture with Linux* OS, malloc() does not
ensure 16 byte alignment */
/* On some architectures, malloc() does not ensure 16 byte alignment,
Solaris/sparc and x86 among them. */

#if KMP_ARCH_X86 || !KMP_HAVE_QUAD
#if KMP_ARCH_X86 || KMP_ARCH_SPARC || !KMP_HAVE_QUAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add anything to the comment above this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this is complete mess: glibc sysdeps/i386/malloc-alignment. does #define MALLOC_ALIGNMENT 16, contrary to the comment (haven't checked if this has changed historically). Besides, KMP_HAVE_QUAD is a complete misnomer: it's only defined on x86 targets, so should probably be renamed to KMP_X86_HAVE_QUAD to reflect that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this comment now, removing the incorrect Linux reference.

As it turns out, malloc alignment is a total can of worms:

  • In theory, the address returned should be aligned to alignof(max_align_t).
  • However, implementations often differ. E.g. Solaris 32-bit libc malloc only uses 8-byte alignment. At the time this was introduced, __float128 didn't exist on x86 yet and it was considered too risky to raise this later for fear of breaking binary compatiblity. Similarly, SPARC uses 8-byte alignment, too, although one could argue that this should be 16 bytes due to the use of 128-bit long double.
  • Many platforms have different malloc implementions that often differ from the one in libc.
  • The actual alignment cannot be queried at build or runtime, unfortunately.

That said, it might be prudent to switch to one of memalign, posix_memalign, or _aligned_malloc. GCC's libgomp (in libgomp/alloc.c:gomp_aligned_alloc) went that route, e.g.


#define SizeQuant 8
#define AlignType double
Expand Down Expand Up @@ -1861,7 +1861,7 @@ typedef struct kmp_mem_desc { // Memory block descriptor
void *ptr_align; // Pointer to aligned memory, returned
kmp_allocator_t *allocator; // allocator
} kmp_mem_desc_t;
static int alignment = sizeof(void *); // align to pointer size by default
constexpr size_t alignment = SizeQuant;

// external interfaces are wrappers over internal implementation
void *__kmpc_alloc(int gtid, size_t size, omp_allocator_handle_t allocator) {
Expand Down
2 changes: 1 addition & 1 deletion openmp/runtime/src/kmp_csupport.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -570,7 +570,7 @@ void __kmpc_fork_teams(ident_t *loc, kmp_int32 argc, kmpc_micro microtask,

this_thr->th.th_teams_microtask = NULL;
this_thr->th.th_teams_level = 0;
*(kmp_int64 *)(&this_thr->th.th_teams_size) = 0L;
memset(&this_thr->th.th_teams_size, 0, sizeof(kmp_teams_size_t));
va_end(ap);
#if KMP_STATS_ENABLED
if (previous_state == stats_state_e::SERIAL_REGION) {
Expand Down
2 changes: 1 addition & 1 deletion openmp/runtime/src/kmp_tasking.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1528,7 +1528,7 @@ kmp_task_t *__kmp_task_alloc(ident_t *loc_ref, kmp_int32 gtid,
// Calculate shared structure offset including padding after kmp_task_t struct
// to align pointers in shared struct
shareds_offset = sizeof(kmp_taskdata_t) + sizeof_kmp_task_t;
shareds_offset = __kmp_round_up_to_val(shareds_offset, sizeof(void *));
shareds_offset = __kmp_round_up_to_val(shareds_offset, sizeof(kmp_uint64));

// Allocate a kmp_taskdata_t block and a kmp_task_t block.
KA_TRACE(30, ("__kmp_task_alloc: T#%d First malloc size: %ld\n", gtid,
Expand Down
Loading