Check if malloc_device return nullptr in matmul code #1493

oleksandr-pavlyk · 2024-01-11T16:46:15Z

Check that sycl::malloc_device did not return nullptr to signal allocation failure.

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
If this PR is a work in progress, are you opening the PR as a draft?

github-actions · 2024-01-11T17:16:30Z

View rendered docs @ https://intelpython.github.io/dpctl/pulls/1493/index.html

github-actions · 2024-01-11T17:45:40Z

Array API standard conformance tests for dpctl=0.15.1dev2=py310h15de555_72 ran successfully.
Passed: 887
Failed: 45
Skipped: 59

github-actions · 2024-01-11T18:30:44Z

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

…1490) * Adds ThreeOffsets_CombinedIndexer This enables strided data processing by gemm kernels * Remove unused `elementwise_functions.cpp` * Implements `matmul`, `vecdot`, and `tensordot` These three functions are implemented through a common `py_dot` binding, which is also part of a new tensor submodule `_tensor_linalg_impl` * Tweaks to `matmul` and gemm kernels Fixes a missing indexer in gemm functor with threading along `nm` dimensions Fixes `matmul` broadcasting, which was broadcasting in some unnecessary cases * Remove double-counting of batch offset in gemm batch tree reduction * Fixes missing dependency in vecdot When the first argument would not be cast and the second argument would be, the copy dependency was not appended to the list of dependencies, creating race conditions * Run test_matmul_simple2 in Windows before full test suite Part of triaging crashes on Windows * Test removing test_matmul_simple leaving only test_matmul_simple2 * Fix incorrect comments throughtout gemm kernels Comments incorrectly stated that the third argument to `scale_gemm_k_parameters` is modified by reference * Drastically reduced parameters used for gemm kernels which thread over k Experimental change to see if this stabilizes CI * Test removal of k-threading gemm kernel which writes to multiple outputs atomically * Refactors `gemm_tree_impl` Now uses two smaller functions, `gemm_tree_k_impl` and `gemm_tree_nm_impl` for greater readability * Reverse order of numeric types passed to test_matmul_simple2 May improve stability on CPU * Refactors `gemm_contig_tree_impl` `gemm_contig_tree_impl` now calls new functions `gemm_contig_tree_k_impl` and `gemm_contig_tree_nm_impl` * Refactoring `gemm_batch_tree` functions Adds new functions for calling `nm` threading and `k` threading kernels to improve readability * Test reversing data types for `test_matmul_strided` * pre-commit fixes in `gemm.hpp` * Check if malloc_device return nullptr (#1493) * Add step to Linux conda package workflow to run `test_matmul_strided` under gdb Part of triaging CPU crashes * Remove unnecessary comments * Adds a fast-path for empty (k = 0) gemm kernels * Adds logic that avoids certain kernels on CPU that are known to be problematic Specifically uses logic to always avoid paths which would call k threaded functors on CPU with m_groups > 1 * Also access memory if indices are in range This prevents out-of-bound access that was responsible for crashes observed in CI. * Simplified computation of m_id/gr_id in kernels No need to use both it.get_global_linear_id() and it.get_group_linear_id() to compute batch id and group id. * Change generic kernels to work for any value of m_groups, not just m_groups=2 * Remove work-arounds/special-casing for CPUs * Extended test_matmul_strided, reverted work-arounds * Revert remaining gemm work-arounds This commit removes remaining checks for if a kernel is called on CPU as well as reverting hyperparameters for gemm kernels to their original values * Revert tuning down of `gemm` kernel parameters * Removed logically dead code from _linear_algebra_functions.py * Added more tests to improve coverage of _linear_algebra_functions * Fixed "UnboundLocalError: local variable 'buf1_dt' referenced before assignment" Initialized buf1_dt and buf2_dt to None * More tests to improve coverage * Removed more dead branches in _linear_algebra_functions.py * `tensordot` now properly handles negative `axes` As per array API, negative axes are not permitted * Adds `test_tensordot_type_matrix` to `test_usm_ndarray_linalg.py` * Addresses flaws in gemm tree kernel logic Previously, assertions for calling a full tree reduction with only a single work-group of elements could be tripped The kernel logic has been changed such that this is no longer possible * Implements `__matmul__`, `__imatmul__`, and `__rmatmul__` operators for usm_ndarray * Makes usm_ndarray operator argument names consistent * Test changes for `tensordot` Adds a test for axes errors in `tensordot` for negative axes Incorporates test for `tensordot` promotion of both inputs into `test_tensordot_type_promotion` * Reverts running certain `matmul` tests under gdb * Fix to typo in `test_tensordot_promotion` * Removes unnecessary input type checks in `matmul` * More tests added to `test_usm_linalg.py` Adds several tests for `matmul` and expands some `tensordot` and `vecdot` tests to improve coverage * Use result_type with tensors to take device capability into account * Use order keyword in test of type promotion for matmul * Make generic k-threaded kernels handle arbitrary m_groups Also increases hyper-parameters for k-threaded kernels to improve performance * Adjusted dispatch logic for gemm kernels Now uses m_groups = 4 when m > 4, and otherwise, m_groups = 1 to improve performance --------- Co-authored-by: Oleksandr Pavlyk <[email protected]>

Check if malloc_device return nullptr

26055e0

oleksandr-pavlyk requested a review from ndgrigorian January 11, 2024 16:46

oleksandr-pavlyk changed the title ~~Check if malloc_device return nullptr~~ Check if malloc_device return nullptr in matmul code Jan 11, 2024

ndgrigorian merged commit 7875f38 into feature/matmul-vecdot-tensordot Jan 11, 2024

oleksandr-pavlyk deleted the check-memory-allocation branch January 11, 2024 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Check if malloc_device return nullptr in matmul code #1493

Check if malloc_device return nullptr in matmul code #1493

Uh oh!

oleksandr-pavlyk commented Jan 11, 2024

Uh oh!

github-actions bot commented Jan 11, 2024

Uh oh!

github-actions bot commented Jan 11, 2024

Uh oh!

github-actions bot commented Jan 11, 2024

Uh oh!

Uh oh!

Check if malloc_device return nullptr in matmul code #1493

Check if malloc_device return nullptr in matmul code #1493

Uh oh!

Conversation

oleksandr-pavlyk commented Jan 11, 2024

Uh oh!

github-actions bot commented Jan 11, 2024

Uh oh!

github-actions bot commented Jan 11, 2024

Uh oh!

github-actions bot commented Jan 11, 2024

Uh oh!

Uh oh!