Skip to content

Check if malloc_device return nullptr in matmul code #1493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

oleksandr-pavlyk
Copy link
Contributor

Check that sycl::malloc_device did not return nullptr to signal allocation failure.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

@oleksandr-pavlyk oleksandr-pavlyk changed the title Check if malloc_device return nullptr Check if malloc_device return nullptr in matmul code Jan 11, 2024
Copy link

Copy link

Array API standard conformance tests for dpctl=0.15.1dev2=py310h15de555_72 ran successfully.
Passed: 887
Failed: 45
Skipped: 59

@ndgrigorian ndgrigorian merged commit 7875f38 into feature/matmul-vecdot-tensordot Jan 11, 2024
Copy link

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@oleksandr-pavlyk oleksandr-pavlyk deleted the check-memory-allocation branch January 11, 2024 20:10
ndgrigorian added a commit that referenced this pull request Jan 23, 2024
…1490)

* Adds ThreeOffsets_CombinedIndexer

This enables strided data processing by gemm kernels

* Remove unused `elementwise_functions.cpp`

* Implements `matmul`, `vecdot`, and `tensordot`

These three functions are implemented through a common `py_dot` binding,
which is also part of a new tensor submodule `_tensor_linalg_impl`

* Tweaks to `matmul` and gemm kernels

Fixes a missing indexer in gemm functor with threading along `nm` dimensions

Fixes `matmul` broadcasting, which was broadcasting in some unnecessary cases

* Remove double-counting of batch offset in gemm batch tree reduction

* Fixes missing dependency in vecdot

When the first argument would not be cast and the second argument would be, the copy dependency was not appended to the list of dependencies, creating race conditions

* Run test_matmul_simple2 in Windows before full test suite

Part of triaging crashes on Windows

* Test removing test_matmul_simple leaving only test_matmul_simple2

* Fix incorrect comments throughtout gemm kernels

Comments incorrectly stated that the third argument to `scale_gemm_k_parameters` is modified by reference

* Drastically reduced parameters used for gemm kernels which thread over k

Experimental change to see if this stabilizes CI

* Test removal of k-threading gemm kernel which writes to multiple outputs atomically

* Refactors `gemm_tree_impl`

Now uses two smaller functions, `gemm_tree_k_impl` and
`gemm_tree_nm_impl` for greater readability

* Reverse order of numeric types passed to test_matmul_simple2

May improve stability on CPU

* Refactors `gemm_contig_tree_impl`

`gemm_contig_tree_impl` now calls new functions `gemm_contig_tree_k_impl` and
`gemm_contig_tree_nm_impl`

* Refactoring `gemm_batch_tree` functions

Adds new functions for calling `nm` threading and `k` threading
kernels to improve readability

* Test reversing data types for `test_matmul_strided`

* pre-commit fixes in `gemm.hpp`

* Check if malloc_device return nullptr (#1493)

* Add step to Linux conda package workflow to run `test_matmul_strided` under gdb

Part of triaging CPU crashes

* Remove unnecessary comments

* Adds a fast-path for empty (k = 0) gemm kernels

* Adds logic that avoids certain kernels on CPU that are known to be problematic

Specifically uses logic to always avoid paths which would
call k threaded functors on CPU with m_groups > 1

* Also access memory if indices are in range

This prevents out-of-bound access that was responsible for crashes
observed in CI.

* Simplified computation of m_id/gr_id in kernels

No need to use both it.get_global_linear_id() and
it.get_group_linear_id() to compute batch id and
group id.

* Change generic kernels to work for any value of m_groups, not just m_groups=2

* Remove work-arounds/special-casing for CPUs

* Extended test_matmul_strided, reverted work-arounds

* Revert remaining gemm work-arounds

This commit removes remaining checks for if a kernel is called on CPU
as well as reverting hyperparameters for gemm kernels to their original
values

* Revert tuning down of `gemm` kernel parameters

* Removed logically dead code from _linear_algebra_functions.py

* Added more tests to improve coverage of _linear_algebra_functions

* Fixed "UnboundLocalError: local variable 'buf1_dt' referenced before assignment"

Initialized buf1_dt and buf2_dt to None

* More tests to improve coverage

* Removed more dead branches in _linear_algebra_functions.py

* `tensordot` now properly handles negative `axes`

As per array API, negative axes are not permitted

* Adds `test_tensordot_type_matrix` to `test_usm_ndarray_linalg.py`

* Addresses flaws in gemm tree kernel logic

Previously, assertions for calling a full tree reduction with only a single work-group of elements could be tripped

The kernel logic has been changed such that this is no longer possible

* Implements `__matmul__`, `__imatmul__`, and `__rmatmul__` operators for usm_ndarray

* Makes usm_ndarray operator argument names consistent

* Test changes for `tensordot`

Adds a test for axes errors in `tensordot` for negative axes

Incorporates test for `tensordot` promotion of both inputs into `test_tensordot_type_promotion`

* Reverts running certain `matmul` tests under gdb

* Fix to typo in `test_tensordot_promotion`

* Removes unnecessary input type checks in `matmul`

* More tests added to `test_usm_linalg.py`

Adds several tests for `matmul` and expands some `tensordot` and `vecdot` tests to improve coverage

* Use result_type with tensors to take device capability into account

* Use order keyword in test of type promotion for matmul

* Make generic k-threaded kernels handle arbitrary m_groups

Also increases hyper-parameters for k-threaded kernels to improve performance

* Adjusted dispatch logic for gemm kernels

Now uses m_groups = 4 when m > 4, and otherwise, m_groups = 1 to improve performance

---------

Co-authored-by: Oleksandr Pavlyk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants