Skip to content

[SYCL][FPGA] Fix invalid memory copying of struct using fpga_reg #3865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mlychkov
Copy link
Contributor

@mlychkov mlychkov commented Jun 2, 2021

When fpga_reg builtin is called with object of record type there is
an extra memcpy call is generated with invalid size parameter value.
Remove redundant memcpy call.

Signed-off-by: Mikhail Lychkov [email protected]

When fpga_reg builtin is called with object of record type there is
an extra memcpy call is generated with invalid size parameter value.
Remove redundant memcpy call.

Signed-off-by: Mikhail Lychkov <[email protected]>
Signed-off-by: Mikhail Lychkov <[email protected]>
@mlychkov mlychkov requested review from MrSidims and AGindinson June 2, 2021 11:25
@mlychkov
Copy link
Contributor Author

mlychkov commented Jun 2, 2021

It would be nice to have approve from @ajaykumarkannan.

@MrSidims
Copy link
Contributor

MrSidims commented Jun 2, 2021

I will be honest with you: I don't really understand what is happening here (and I would withdraw from the review if possible :) ). So the issue is that we don't need the memcpy, or that the size of the memcpy is incorrect? And if the issue is with redundant memcpy call, then what affect has the addition of char structure member in the test?

@mlychkov
Copy link
Contributor Author

mlychkov commented Jun 3, 2021

I will be honest with you: I don't really understand what is happening here (and I would withdraw from the review if possible :) ). So the issue is that we don't need the memcpy, or that the size of the memcpy is incorrect? And if the issue is with redundant memcpy call, then what affect has the addition of char structure member in the test?

Currently the following code is generated for this intrinsic:
void fpga_reg(T *dest, T *src) { T local_copy; memcpy(&local_copy, src, sizeof(*src)); memcpy(dest, &local_copy, sizeof(local_copy)); }
Root cause of initial issue was in incorrect memory size of second memcpy call. It was set to size of a pointer type (&local_copy), 8 bytes. That is why I decided to change the size of class in LIT test to a value that differs from 8 bytes.
Later I thought that this local copy is not neccessary here and should be generated in BE compiler for FPGA if required. And just removed generation of local_copy object.

@ajaykumarkannan
Copy link
Contributor

Change looks good to me!

Copy link
Contributor

@MrSidims MrSidims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a question to help my understanding.

// CHECK-NEXT: [[TMP_S10:%.*]] = bitcast i8 addrspace(4)* [[TMP_S9]] to [[T_ST]] addrspace(4)*
// CHECK-NEXT: [[TMP_S11:%.*]] = bitcast [[T_ST]] addrspace(4)* [[S3_ASCAST]] to i8 addrspace(4)*
// CHECK-NEXT: [[TMP_S12:%.*]] = bitcast [[T_ST]] addrspace(4)* [[REF_TMP_ASCAST]] to i8 addrspace(4)*
// CHECK-NEXT: call void @llvm.memcpy.p4i8.p4i8.i64(i8 addrspace(4)* align 4 [[TMP_S11]], i8 addrspace(4)* align 4 [[TMP_S12]], i64 12, i1 false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are two memcpy's still needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AST for the following instruction: struct st s2 = __builtin_intel_fpga_reg(s1); is:

DeclStmt 0x555565a32cf8
`-VarDecl 0x555565a326d0  used s2 'struct st':'struct st' cinit
  `-ExprWithCleanups 0x555565a32ce0 'struct st':'struct st'
    `-CXXConstructExpr 0x555565a32cb0 'struct st':'struct st' 'void (struct st &&) noexcept' elidable
      `-MaterializeTemporaryExpr 0x555565a329e8 'struct st':'struct st' xvalue
        `-CallExpr 0x555565a329c0 'struct st':'struct st'
          |-ImplicitCastExpr 0x555565a329a8 'void (*)(...) noexcept' <BuiltinFnToFnPtr>
          | `-DeclRefExpr 0x555565a32910 '<builtin fn type>' Function 0x555565a327f0 '__builtin_intel_fpga_reg' 'void (...) noexcept'
          `-DeclRefExpr 0x555565a32930 'struct st':'struct st' lvalue Var 0x555565a0b090 's1' 'struct st':'struct st'

AST for the following instruction: s3 = __builtin_intel_fpga_reg(s2); is:

ExprWithCleanups 0x555565a34a78 'struct st' lvalue
`-CXXOperatorCallExpr 0x555565a34a40 'struct st' lvalue '='
  |-ImplicitCastExpr 0x555565a34a28 'struct st &(*)(struct st &&) noexcept' <FunctionToPointerDecay>
  | `-DeclRefExpr 0x555565a331d0 'struct st &(struct st &&) noexcept' lvalue CXXMethod 0x555565a33028 'operator=' 'struct st &(struct st &&) noexcept'
  |-DeclRefExpr 0x555565a32dd8 'struct st':'struct st' lvalue Var 0x555565a32d30 's3' 'struct st':'struct st'
  `-MaterializeTemporaryExpr 0x555565a331b8 'struct st':'struct st' xvalue
    `-CallExpr 0x555565a32e50 'struct st':'struct st'
      |-ImplicitCastExpr 0x555565a32e38 'void (*)(...) noexcept' <BuiltinFnToFnPtr>
      | `-DeclRefExpr 0x555565a32df8 '<builtin fn type>' Function 0x555565a327f0 '__builtin_intel_fpga_reg' 'void (...) noexcept'
      `-DeclRefExpr 0x555565a32e18 'struct st':'struct st' lvalue Var 0x555565a326d0 's2' 'struct st':'struct st'

In second case clang generates temporary variable ref.tmp to store value of s2 and call __builtin_intel_fpga_reg with ref.tmp argument instead of s2. I suppose this is generic code generation pattern in clang, but I don't have much llvm/clang knowledge to eliminate generation of temporary variable in this particular case for __builtin_intel_fpga_reg intrinsic.

@mlychkov mlychkov marked this pull request as draft June 8, 2021 13:48
@mlychkov
Copy link
Contributor Author

mlychkov commented Jun 8, 2021

Moved to draft to validate that FPGA backend is not affected by this change.

@mlychkov mlychkov marked this pull request as ready for review August 18, 2021 14:18
@mlychkov mlychkov requested a review from premanandrao August 18, 2021 14:18
@mlychkov
Copy link
Contributor Author

@AaronBallman, @elizabethandrews Could you please review this patch?

Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bader bader merged commit a5d290d into intel:sycl Aug 18, 2021
@mlychkov mlychkov deleted the private/mlychkov/fix_fpga_reg_copy_for_structs branch September 2, 2021 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants