Skip to content

Update L0 provider to use getpidfd function #558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 1, 2024

Conversation

vinser52
Copy link
Contributor

@vinser52 vinser52 commented Jun 20, 2024

Description

As we agreed, while L0 driver has not fixed IPC implementation, in UMF we will use getpidfd approach.
So in this PR does the following:

  • updates L0 provider implementation.
  • refactors OS memory provider and moved common functions to utils.
  • refactors multi-process IPC test and created a version for L0 provider in addition to the test for OS provider.

Checklist

  • Code compiles without errors locally
  • New tests added, especially if they will fail without my changes
  • All newly added source files have a license
  • All newly added source files are referenced in CMake files
  • All tests pass locally
  • CI workflows execute properly

@vinser52 vinser52 requested a review from a team as a code owner June 20, 2024 17:51
@vinser52 vinser52 force-pushed the svinogra_ipc_api branch 2 times, most recently from ab36c40 to 59475d7 Compare June 20, 2024 21:57
@ldorau
Copy link
Contributor

ldorau commented Jun 21, 2024

The umf_example_ipc_ipcapi_anon_fd test fails

@vinser52
Copy link
Contributor Author

The umf_example_ipc_ipcapi_anon_fd test fails

Yeah, looks like it fails because utils_duplicate_fd: __NR_pidfd_open or __NR_pidfd_getfd not available. It is strange, did this test pass before on Ubuntu 20.04?

@vinser52
Copy link
Contributor Author

The umf_example_ipc_ipcapi_anon_fd test fails

Yeah, looks like it fails because utils_duplicate_fd: __NR_pidfd_open or __NR_pidfd_getfd not available. It is strange, did this test pass before on Ubuntu 20.04?

Ok, I understand the issue. Prior to this PR we returned Not supported if required syscalls are not available, This PR just returns UMF_RESULT_ERROR_MEMORY_PROVIDER_SPECIFIC. I will fix it to keep the current behavior.

Comment on lines 71 to 84
switch (errno) {
case EINVAL:
case ESRCH:
return UMF_RESULT_ERROR_INVALID_ARGUMENT;
case EMFILE:
case ENOMEM:
return UMF_RESULT_ERROR_OUT_OF_HOST_MEMORY;
case ENODEV:
case ENOSYS:
case ENOTSUP:
return UMF_RESULT_ERROR_NOT_SUPPORTED;
default:
return UMF_RESULT_ERROR_UNKNOWN;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create errnoToUmfResult function and use it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


if (ze_ipc_data->pid != utils_getpid()) {
int fd_remote = -1;
memcpy(&fd_remote, &ze_ipc_handle, sizeof(fd_remote));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using memcpy to copy single int is an overkill

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler optimizes it. When the size is low enough there is now an actual call to memcpy, it is replaced with move instructions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more code readability issue then performance. Assignment is shorter, and easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding readability, I am not sure:

fd_remote = *(int*)&ze_ipc_handle;

vs

memcpy(&fd_remote, &ze_ipc_handle, sizeof(fd_remote));

and

*((int*)&ze_ipc_handle) = fd_local;

vs

memcpy(&ze_ipc_handle, &fd_local, sizeof(fd_local));

Are you still suggesting to get rid of memcpy? If so, I will do that. Please confirm.

Copy link
Contributor

@lplewa lplewa Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This IPC api in L0 is awful....

I checked definition of ze_ipc_mem_handle_t and as this is opaque struct, memcpy here make sense here.

NIT: We should consider something like this - as working on opaque structs without clear definition of it's layout is quite error prone.

union umf_ipc_mem_handle {
    ze_ipc_mem_handle_t ze;
    struct {
        int fd;
    } def;
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all this is needed just because L0 IPC implementation is broken.
According to the L0 spec all these steps not needed at all. When L0 driver fix their implementation we will get rid of these changes.

@vinser52 vinser52 force-pushed the svinogra_ipc_api branch 2 times, most recently from 67c657c to 9bddf8e Compare June 28, 2024 09:12
@ldorau ldorau requested review from lplewa, KFilipek and kswiecicki June 28, 2024 09:19
@ldorau
Copy link
Contributor

ldorau commented Jul 1, 2024

@vinser52 Please rebase. There is a strange message here:
"This branch is out-of-date with the base branch
Merge the latest changes from main into this branch.
"

@lukaszstolarczuk
Copy link
Contributor

@vinser52 Please rebase. There is a strange message here: "This branch is out-of-date with the base branch Merge the latest changes from main into this branch."

That's a new addition. I'm testing a few security options to fix the issue: #580 This precise option seem a little annoying (but perhaps maintainers can just do the update before merging) - I will keep investigating these security options

@vinser52 vinser52 force-pushed the svinogra_ipc_api branch from 8725725 to c5d2d94 Compare July 1, 2024 10:58
@vinser52
Copy link
Contributor Author

vinser52 commented Jul 1, 2024

@vinser52 Please rebase. There is a strange message here: "This branch is out-of-date with the base branch Merge the latest changes from main into this branch."

Done.

@ldorau
Copy link
Contributor

ldorau commented Jul 1, 2024

@lplewa please respond to your issues

@vinser52 vinser52 force-pushed the svinogra_ipc_api branch from c5d2d94 to a0850d8 Compare July 1, 2024 12:08
@ldorau ldorau merged commit 377f6fa into oneapi-src:main Jul 1, 2024
73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants