Skip to content

Implement runtime support for device global with init_mode reprogram #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 12, 2022

Conversation

sophimao
Copy link
Contributor

@sophimao sophimao commented Sep 14, 2022

  • Change to autodiscovery string: add fields in device global for device global attributes, including host_access, init_mode, and implement_in_csr
  • Change to kernel: if the device to which kernel is to be launched contains device global with init_mode reprogram, force reprogram device before kernel launch

@pcolberg pcolberg added the enhancement New feature or request label Sep 14, 2022
@pcolberg pcolberg added this to the 2023.0 milestone Sep 14, 2022
@sophimao sophimao force-pushed the dev-global-reprogram branch from caa7def to 662b90e Compare September 20, 2022 14:52
@pcolberg pcolberg removed this from the 2023.0 milestone Oct 19, 2022
@sophimao sophimao marked this pull request as ready for review November 18, 2022 16:45
@sophimao
Copy link
Contributor Author

Compiler side change is in the move now, marking this as ready for review.

@pcolberg pcolberg added this to the 2023.1 milestone Nov 18, 2022
@pcolberg pcolberg linked an issue Nov 18, 2022 that may be closed by this pull request
Copy link
Contributor

@pcolberg pcolberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sophimao! The overall change looks good, just some minor stylistic comments.

Could you restructure this into two commits, folding in the clang-format change?

  1. Extend auto-discovery string parsing and unit-test.
  2. Extend kernel reprogramming logic.

For the second commit, is it feasible to add unit testing? If it is not obvious, what do you see as the main blocker?

@pcolberg
Copy link
Contributor

@sophimao, does this PR fully resolve #56?

@sophimao
Copy link
Contributor Author

sophimao commented Nov 21, 2022

@sophimao, does this PR fully resolve #56?

Unfortunately no, #56 tracks the device global copy kernel, which is in PR #65. That PR is also waiting on compiler support.

@sophimao
Copy link
Contributor Author

sophimao commented Nov 21, 2022

For the second commit, is it feasible to add unit testing? If it is not obvious, what do you see as the main blocker?

I looked into adding unit testing for the kernel change, to do that I would need to add another device with device global, that would enable me to run through the modified code, but I don't see a way to ensure the need_reprogram variable to be true (I don't think we can unit test reprogram either). Would that be okay?

Edit: the unit test seems to have a construct for testing device op queue status acl_device_op_test_ctx_t, will see if I can use that to write a unit test

@pcolberg
Copy link
Contributor

pcolberg commented Nov 21, 2022

the unit test seems to have a construct for testing device op queue status acl_device_op_test_ctx_t, will see if I can use that to write a unit test

Yup, the effect of need_program is to submit a ACL_DEVICE_OP_REPROGRAM, which you should be able to verify in the unit test, depending on the availability of device globals and the binary_rand_hash. The acl_kernel_reprogram_scheduler test group should provide ideas on how to implement this.

@sophimao sophimao force-pushed the dev-global-reprogram branch 2 times, most recently from 0459bc9 to a7e0925 Compare November 22, 2022 14:27
Copy link
Contributor

@pcolberg pcolberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sophimao for the revision. Apart from a few minor details, this is ready to be merged once the corresponding compiler change has been tested together with this change.

@sophimao sophimao force-pushed the dev-global-reprogram branch from a7e0925 to daa9492 Compare December 1, 2022 14:17
pcolberg
pcolberg previously approved these changes Dec 3, 2022
Copy link
Contributor

@pcolberg pcolberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sophimao, this is good to be merged once integration testing is completed.

@ChenXingYang2000, you can use the runtime commit hash daa9492 for your testing.

@sophimao sophimao force-pushed the dev-global-reprogram branch 3 times, most recently from 2c1e1bb to 2ad0b64 Compare December 8, 2022 16:44
@sophimao
Copy link
Contributor Author

sophimao commented Dec 8, 2022

Hi @ChenXingYang2000, I have updated the code and sanity checked with the reprogram_trivial test you provided. I might add some more unit tests later but you can use hash 2ad0b64 for testing now.

@ChenXingYang2000
Copy link

Hi @ChenXingYang2000, I have updated the code and sanity checked with the reprogram_trivial test you provided. I might add some more unit tests later but you can use hash 2ad0b64 for testing now.

Thank you for such quick fix, really appreciate it, I will test it out now.

@sophimao
Copy link
Contributor Author

sophimao commented Dec 8, 2022

Hi @pcolberg , I'm done changing the code for now to align with the desired behaviour of this feature, would you mind take a look at this again when you have time? For testing purposes I pushed the last two commits directly but I'll squash them when this change is fully ready. Thank you!

@sophimao sophimao force-pushed the dev-global-reprogram branch from de3d310 to ba2a295 Compare December 8, 2022 20:53
Copy link
Contributor

@pcolberg pcolberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sophimao! Most of this change is dedicated to new tests which is great 🙂

@sophimao sophimao force-pushed the dev-global-reprogram branch from ba2a295 to 9f798f5 Compare December 9, 2022 15:52
…bal with init_mode reprogram

There are two places that checks whether we can skip initial reprogram of the device:
  1) When attempting device eager reprogram
  2) When launching a kernel (for the first time)
Changes to acl_device_binary.cpp tackles the 1) and changes to acl_kernel.cpp tackles 2)
@sophimao sophimao force-pushed the dev-global-reprogram branch from 9f798f5 to 483a019 Compare December 12, 2022 14:34
Copy link
Contributor

@pcolberg pcolberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sophimao!

@sophimao sophimao merged commit 71d527e into intel:main Dec 12, 2022
@sophimao sophimao deleted the dev-global-reprogram branch December 12, 2022 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants