-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][FPGA] Fix invalid memory copying of struct using fpga_reg #3865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][FPGA] Fix invalid memory copying of struct using fpga_reg #3865
Conversation
When fpga_reg builtin is called with object of record type there is an extra memcpy call is generated with invalid size parameter value. Remove redundant memcpy call. Signed-off-by: Mikhail Lychkov <[email protected]>
Signed-off-by: Mikhail Lychkov <[email protected]>
It would be nice to have approve from @ajaykumarkannan. |
I will be honest with you: I don't really understand what is happening here (and I would withdraw from the review if possible :) ). So the issue is that we don't need the memcpy, or that the size of the memcpy is incorrect? And if the issue is with redundant memcpy call, then what affect has the addition of |
Currently the following code is generated for this intrinsic: |
Change looks good to me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a question to help my understanding.
// CHECK-NEXT: [[TMP_S10:%.*]] = bitcast i8 addrspace(4)* [[TMP_S9]] to [[T_ST]] addrspace(4)* | ||
// CHECK-NEXT: [[TMP_S11:%.*]] = bitcast [[T_ST]] addrspace(4)* [[S3_ASCAST]] to i8 addrspace(4)* | ||
// CHECK-NEXT: [[TMP_S12:%.*]] = bitcast [[T_ST]] addrspace(4)* [[REF_TMP_ASCAST]] to i8 addrspace(4)* | ||
// CHECK-NEXT: call void @llvm.memcpy.p4i8.p4i8.i64(i8 addrspace(4)* align 4 [[TMP_S11]], i8 addrspace(4)* align 4 [[TMP_S12]], i64 12, i1 false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are two memcpy's still needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AST for the following instruction: struct st s2 = __builtin_intel_fpga_reg(s1);
is:
DeclStmt 0x555565a32cf8
`-VarDecl 0x555565a326d0 used s2 'struct st':'struct st' cinit
`-ExprWithCleanups 0x555565a32ce0 'struct st':'struct st'
`-CXXConstructExpr 0x555565a32cb0 'struct st':'struct st' 'void (struct st &&) noexcept' elidable
`-MaterializeTemporaryExpr 0x555565a329e8 'struct st':'struct st' xvalue
`-CallExpr 0x555565a329c0 'struct st':'struct st'
|-ImplicitCastExpr 0x555565a329a8 'void (*)(...) noexcept' <BuiltinFnToFnPtr>
| `-DeclRefExpr 0x555565a32910 '<builtin fn type>' Function 0x555565a327f0 '__builtin_intel_fpga_reg' 'void (...) noexcept'
`-DeclRefExpr 0x555565a32930 'struct st':'struct st' lvalue Var 0x555565a0b090 's1' 'struct st':'struct st'
AST for the following instruction: s3 = __builtin_intel_fpga_reg(s2);
is:
ExprWithCleanups 0x555565a34a78 'struct st' lvalue
`-CXXOperatorCallExpr 0x555565a34a40 'struct st' lvalue '='
|-ImplicitCastExpr 0x555565a34a28 'struct st &(*)(struct st &&) noexcept' <FunctionToPointerDecay>
| `-DeclRefExpr 0x555565a331d0 'struct st &(struct st &&) noexcept' lvalue CXXMethod 0x555565a33028 'operator=' 'struct st &(struct st &&) noexcept'
|-DeclRefExpr 0x555565a32dd8 'struct st':'struct st' lvalue Var 0x555565a32d30 's3' 'struct st':'struct st'
`-MaterializeTemporaryExpr 0x555565a331b8 'struct st':'struct st' xvalue
`-CallExpr 0x555565a32e50 'struct st':'struct st'
|-ImplicitCastExpr 0x555565a32e38 'void (*)(...) noexcept' <BuiltinFnToFnPtr>
| `-DeclRefExpr 0x555565a32df8 '<builtin fn type>' Function 0x555565a327f0 '__builtin_intel_fpga_reg' 'void (...) noexcept'
`-DeclRefExpr 0x555565a32e18 'struct st':'struct st' lvalue Var 0x555565a326d0 's2' 'struct st':'struct st'
In second case clang generates temporary variable ref.tmp
to store value of s2
and call __builtin_intel_fpga_reg
with ref.tmp
argument instead of s2
. I suppose this is generic code generation pattern in clang, but I don't have much llvm/clang knowledge to eliminate generation of temporary variable in this particular case for __builtin_intel_fpga_reg
intrinsic.
Moved to draft to validate that FPGA backend is not affected by this change. |
@AaronBallman, @elizabethandrews Could you please review this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
When fpga_reg builtin is called with object of record type there is
an extra memcpy call is generated with invalid size parameter value.
Remove redundant memcpy call.
Signed-off-by: Mikhail Lychkov [email protected]