-
Notifications
You must be signed in to change notification settings - Fork 12.3k
SYCL: Introducing memory host pool #11251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp Signed-off-by: nscipione <[email protected]>
Signed-off-by: nscipione <[email protected]>
{ | ||
if (scaling_type == library_data_t::real_float && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
complex type doesn't seem to be used anywhere or in any other backend so I removed it to simplify/clean things
ggml/src/ggml-sycl/ggml-sycl.cpp
Outdated
@@ -3363,6 +3449,7 @@ static void ggml_sycl_mul_mat_batched_sycl(ggml_backend_sycl_context & ctx, | |||
|
|||
ggml_sycl_pool_alloc<const void *> ptrs_src(ctx.pool(), 2*ne23); | |||
ggml_sycl_pool_alloc< void *> ptrs_dst(ctx.pool(), 1*ne23); | |||
ggml_sycl_pool_alloc<matrix_info_t<float>> matrix_info(ctx.host_pool(),1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type passed to matrix_info_t
is float because it allows to be casted to the other necessary types to call oneMath functions. Unfortunately it needs to be set here given the current design and include files necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the contribution! I think the host_pool
is a good idea, since the use of host_tasks
can introduce performance overheads that are not easy to pinpoint.
I left a comment to discuss a bit the design. Let me know your thoughts there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device(device_)
should be initialized after qptr(qptr_)
in ggml_sycl_pool_host
I think.
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:1193:9: warning: field 'qptr' will be initialized after field 'device' [-Wreorder-ctor]
1193 | qptr(qptr_),
| ^~~~~~~~~~~
| device(device_)
1194 | device(device_) {
| ~~~~~~~~~~~~~~~
| qptr(qptr_)
Signed-off-by: nscipione <[email protected]>
Signed-off-by: nscipione <[email protected]>
Signed-off-by: nscipione <[email protected]>
Thank you for the comment @qnixsynapse I address it in 6b77639 |
@s-Nick Sorry for being a bit fussy about compiler warnings. There is one tiny issue left which can be fixed by adding a So far everything else LGTM. |
Signed-off-by: nscipione <[email protected]>
@qnixsynapse don't worry, more eyes are always better, thank you for reviewing it. I addressed it in 963b685 |
@s-Nick Thanks. LGTM! |
* Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <[email protected]>
* Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <[email protected]>
* Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <[email protected]>
* Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <[email protected]>
This patch adds a host memory pool dedicated to
matrix_info_t
. This struct is used when callinggemm_batch_impl
and right now forces anhost_task
synchronization to free memory. After this patch it is not necessary anymore. The pool ensures that the memory will be usable long enough to call the kernel and it will be freed at the end of the program.Pool's design is naive and right now usable only with above reported type, further improvement are possible but not necessary at the moment.
Full test pass on A100 and Intel PVC.
Benchmark results for NVIDIA:
FP32
Current
build: a29f087 (4473)
This patch
build: e878c29 (4476)
FP16
Current
build: a29f087 (4473)
This patch
build: e878c29 (4476)