Skip to content

[SYCL][USM] Make host device return nullptr for bad mallocs (huge, 0, etc) #988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 23, 2020

Conversation

jbrodman
Copy link
Contributor

Host device wasn't properly handling alloc::unknown.
Also, aligned allocator wasn't failing when it really should, so check to throw an error.

Signed-off-by: James Brodman [email protected]

Signed-off-by: James Brodman <[email protected]>
bader
bader previously approved these changes Jan 9, 2020
bader
bader previously approved these changes Jan 10, 2020
@bader
Copy link
Contributor

bader commented Jan 14, 2020

@jbrodman, do you know why new test fails on Windows/GPU?

romanovvlad
romanovvlad previously approved these changes Jan 14, 2020
@jbrodman
Copy link
Contributor Author

@jbrodman, do you know why new test fails on Windows/GPU?

I do not, but building on windows takes forever. Looking into it.

@jbrodman
Copy link
Contributor Author

jbrodman commented Jan 14, 2020

@jbrodman, do you know why new test fails on Windows/GPU?

I do not, but building on windows takes forever. Looking into it.

@bader Ok - I was able to get everything built on windows again. I updated my DCH Windows drivers to version 26.20.100.7584 from 12/18/2019 (aka latest public version) and the tests pass, so I dunno what the deal is.

@bader
Copy link
Contributor

bader commented Jan 15, 2020

Interesting... buildbot machines uses 26.20.100.7372 driver and @jbrodman validated that 26.20.100.7584 driver has no issues. I don't understand why driver version might be the problem as new test should not call any driver API, unless I missing something. The tested functions should return at argument validation before calling driver API. Right?

@jbrodman
Copy link
Contributor Author

Interesting... buildbot machines uses 26.20.100.7372 driver and @jbrodman validated that 26.20.100.7584 driver has no issues. I don't understand why driver version might be the problem as new test should not call any driver API, unless I missing something. The tested functions should return at argument validation before calling driver API. Right?

The changes only checked for overflow conditions really. The older driver might still have not been dying when it should've? Modified the test so we can get a better idea of where it's failing.

Signed-off-by: James Brodman <[email protected]>
@jbrodman
Copy link
Contributor Author

jbrodman commented Jan 17, 2020

Yeah I'm pretty sure this is dying in NEO (which is barfing on sz = -1). When are we updating the Win GPU drivers?

Can we blacklist this test on windows for now?

@bader
Copy link
Contributor

bader commented Jan 17, 2020

Yeah I'm pretty sure this is dying in NEO (which is barfing on sz = -1).

Why? Don't this change prevents SYCL runtime from using lower level API?

    if (Size > NumBytes)
      throw std::bad_alloc();

Maybe we missing similar check in other functions...

@jbrodman
Copy link
Contributor Author

Yeah I'm pretty sure this is dying in NEO (which is barfing on sz = -1).

Why? Don't this change prevents SYCL runtime from using lower level API?

    if (Size > NumBytes)
      throw std::bad_alloc();

Maybe we missing similar check in other functions...

The test passes on the HOST device. It fails on the GPU device b/c of old windows drivers. Native device support doesn't do the funky alignment massaging your aligned_allocator does, so that fix is not relevant for those devices. (or rather, they need to do the right thing inside their own APIs)

@romanovvlad
Copy link
Contributor

Yeah I'm pretty sure this is dying in NEO (which is barfing on sz = -1).

Why? Don't this change prevents SYCL runtime from using lower level API?

    if (Size > NumBytes)
      throw std::bad_alloc();

Maybe we missing similar check in other functions...

The test passes on the HOST device. It fails on the GPU device b/c of old windows drivers. Native device support doesn't do the funky alignment massaging your aligned_allocator does, so that fix is not relevant for those devices. (or rather, they need to do the right thing inside their own APIs)

Suggest disabling the test to unblock commit.
You can disable entire LIST test on windows by adding
// UNSUPPORTED: windows
or disable run on GPU on both OSes by breaking RUN word
// RUNx: %GPU_RUN_PLACEHOLDER %t.out

@jbrodman
Copy link
Contributor Author

Yeah I'm pretty sure this is dying in NEO (which is barfing on sz = -1).

Why? Don't this change prevents SYCL runtime from using lower level API?

    if (Size > NumBytes)
      throw std::bad_alloc();

Maybe we missing similar check in other functions...

The test passes on the HOST device. It fails on the GPU device b/c of old windows drivers. Native device support doesn't do the funky alignment massaging your aligned_allocator does, so that fix is not relevant for those devices. (or rather, they need to do the right thing inside their own APIs)

Suggest disabling the test to unblock commit.
You can disable entire LIST test on windows by adding
// UNSUPPORTED: windows
or disable run on GPU on both OSes by breaking RUN word
// RUNx: %GPU_RUN_PLACEHOLDER %t.out

Thanks. Will just disable on windows for now.

@bader bader merged commit 2a000d9 into intel:sycl Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants