-
Notifications
You must be signed in to change notification settings - Fork 12.2k
SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend #10584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <[email protected]>
@Alcpz Could you check it out? |
@Rbiessy Feel free to give it a look, since you have experience working with oneMKL interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments for other update.
@@ -1690,8 +1690,12 @@ namespace dpct | |||
auto data_b = get_memory<const Tb>(b); | |||
auto data_c = get_memory<Tc>(c); | |||
oneapi::mkl::blas::column_major::gemm( | |||
q, a_trans, b_trans, m, n, k, alpha_value, data_a, lda, | |||
data_b, ldb, beta_value, data_c, ldc); | |||
#ifdef GGML_SYCL_NVIDIA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro make the code is hard to understand.
I suggest:
#ifdef GGML_SYCL_NVIDIA
oneapi::mkl::blas::column_major::gemm(
oneapi::mkl::backend_selector<oneapi::mkl::backend::cublas>{ q },
a_trans, b_trans, m, n, k, alpha_value, data_a, lda,
data_b, ldb, beta_value, data_c, ldc);
}
#else
oneapi::mkl::blas::column_major::gemm(
q,
a_trans, b_trans, m, n, k, alpha_value, data_a, lda,
data_b, ldb, beta_value, data_c, ldc);
}
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we start adding support for Intel GPU as well I think it would make more sense to have a helper function that returns either a backend_selector
or a queue based on the backend.
It would avoid duplicating the call to gemm which I think is a risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remember, the SYCL backend is initiated to support Intel GPU. :)
Support more vendor GPUs is added later.
The default code path should be optimized for Intel GPU.
It's OK to set special queue for other vendor GPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code update for readability in f6e6fc4
@s-Nick |
Thank you for your review @NeoZhangJianyu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The oneMKL Interface changes look good to me.
OK, I see! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes lgtm. Let's wait for the remaining thread to be resolved before merging.
…IDIA backend (ggml-org#10584) * [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <[email protected]> * Formatting * Address PR comments to increase readibility --------- Signed-off-by: nscipione <[email protected]>
…IDIA backend (ggml-org#10584) * [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <[email protected]> * Formatting * Address PR comments to increase readibility --------- Signed-off-by: nscipione <[email protected]>
This patch move oneMKL interface calls to gemm from real time to compile time for NVIDIA backend, bringing improvements specially in text generation.
Tested on A100:
Current
build: 0f77aae (20)
With changes
build: ffd0a99 (4222)