Skip to content

Revert gh-2082 with w/a for gemm issue in OneMKL #2101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,6 @@ In addition, this release completes implementation of `dpnp.fft` module and adds
* Fixed a crash in `dpnp.choose` caused by missing control of releasing temporary allocated device memory [#2063](https://github.com/IntelPython/dpnp/pull/2063)
* Resolved compilation warning and error while building in debug mode [#2066](https://github.com/IntelPython/dpnp/pull/2066)
* Fixed an issue with asynchronous execution in `dpnp.fft` module [#2067](https://github.com/IntelPython/dpnp/pull/2067)
* Added a workaround to fix the incorrect result from `dpnp.matmul` computing on Lunar Lake or Arrow Lake Battlemage graphics [#2082](https://github.com/IntelPython/dpnp/pull/2082)

## [0.15.0] - 05/25/2024

Expand Down
7 changes: 0 additions & 7 deletions dpnp/backend/extensions/blas/blas_py.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -127,13 +127,6 @@ PYBIND11_MODULE(_blas_impl, m)
py::arg("resultC"), py::arg("depends") = py::list());
}

{
m.def("_is_lnl_bm_architecture", &blas_ns::_is_lnl_bm_architecture,
"Return ``True`` if SYCL device belongs to either Lunar Lake or "
"Battlemage G21 Intel GPU architecture",
py::arg("device"));
}

{
m.def("_gemm_batch", &blas_ns::gemm_batch,
"Call `gemm_batch` from OneMKL BLAS library to compute "
Expand Down
16 changes: 0 additions & 16 deletions dpnp/backend/extensions/blas/gemm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -323,22 +323,6 @@ std::tuple<sycl::event, sycl::event, bool>
return std::make_tuple(args_ev, gemm_ev, is_row_major);
}

bool _is_lnl_bm_architecture(const sycl::device &dev)
{
#if !defined(USE_ONEMKL_CUBLAS)
namespace syclex = sycl::ext::oneapi::experimental;
const auto arch = dev.get_info<syclex::info::device::architecture>();
switch (arch) {
case syclex::architecture::intel_gpu_lnl_m: /* Lunar Lake */
case syclex::architecture::intel_gpu_bmg_g21: /* Battlemage G21 */
return true;
default:
return false;
}
#endif // !defined(USE_ONEMKL_CUBLAS)
return false;
}

template <typename fnT, typename Tab, typename Tc>
struct GemmContigFactory
{
Expand Down
2 changes: 0 additions & 2 deletions dpnp/backend/extensions/blas/gemm.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,6 @@ extern std::tuple<sycl::event, sycl::event, bool>
const dpctl::tensor::usm_ndarray &resultC,
const std::vector<sycl::event> &depends);

extern bool _is_lnl_bm_architecture(const sycl::device &dev);

extern std::tuple<sycl::event, sycl::event, bool>
gemm_batch(sycl::queue &exec_q,
const dpctl::tensor::usm_ndarray &matrixA,
Expand Down
28 changes: 0 additions & 28 deletions dpnp/dpnp_utils/dpnp_utils_linearalgebra.py
Original file line number Diff line number Diff line change
Expand Up @@ -894,34 +894,6 @@ def dpnp_matmul(
)
_manager.add_event_pair(ht_ev, gemv_ev)
elif call_flag == "gemm":
# MKLD-17976: due to known issue in OneMKL on Lunar Lake and
# Battlemage G21 Intel GPU architectures, it forces
# to implement a temporary workaround with extra copying of
# an input array in case when it has a small size and
# non-zero offset
# The issue was detected by failing tests for eig/eigh
# TODO: remove the workaround once OneMKL issue is resolved
if bi._is_lnl_bm_architecture(exec_q.get_sycl_device()):

def _need_to_copy(a):
a_usm = dpnp.get_usm_ndarray(a)
if a_usm._element_offset > 0 and a_usm.size < 16:
return True
return False

x1 = _copy_array(
x1,
copy_flag=_need_to_copy(x1),
dtype=compute_dtype,
order=res_order,
)
x2 = _copy_array(
x2,
copy_flag=_need_to_copy(x2),
dtype=compute_dtype,
order=res_order,
)

result = _gemm_matmul(
exec_q,
x1,
Expand Down
18 changes: 18 additions & 0 deletions tests/test_mathematical.py
Original file line number Diff line number Diff line change
Expand Up @@ -3885,6 +3885,24 @@ def test_linalg_matmul(self):
expected = numpy.linalg.matmul(a, b)
assert_array_equal(result, expected)

@pytest.mark.parametrize(
"sh1, sh2",
[
((2, 3, 3), (2, 3, 3)),
((3, 3, 3, 3), (3, 3, 3, 3)),
],
ids=["gemm", "gemm_batch"],
)
def test_matmul_with_offsets(self, sh1, sh2):
size1, size2 = numpy.prod(sh1, dtype=int), numpy.prod(sh2, dtype=int)
a = numpy.random.randint(-5, 5, size1).reshape(sh1).astype("f8")
b = numpy.random.randint(-5, 5, size2).reshape(sh2).astype("f8")
ia, ib = dpnp.array(a), dpnp.array(b)

result = ia[1] @ ib[1]
expected = a[1] @ b[1]
assert_array_equal(result, expected)


class TestMatmulInvalidCases:
@pytest.mark.parametrize(
Expand Down
Loading