SYCL: Introducing memory host pool #11251

s-Nick · 2025-01-15T10:25:14Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

This patch adds a host memory pool dedicated to matrix_info_t. This struct is used when calling gemm_batch_impl and right now forces an host_task synchronization to free memory. After this patch it is not necessary anymore. The pool ensures that the memory will be usable long enough to call the kernel and it will be freed at the end of the program.

Pool's design is naive and right now usable only with above reported type, further improvement are possible but not necessary at the moment.

Full test pass on A100 and Intel PVC.
Benchmark results for NVIDIA:

FP32

Current

model	size	params	backend	ngl	threads	sm	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	pp512	759.05 ± 4.70
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	tg128	89.10 ± 0.16
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	pp512	72.93 ± 0.31
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	tg128	18.17 ± 0.02
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	743.26 ± 1.85
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	91.47 ± 0.07

build: a29f087 (4473)

This patch

model	size	params	backend	ngl	threads	sm	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	pp512	969.44 ± 7.14
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	tg128	89.98 ± 0.18
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	pp512	104.87 ± 0.21
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	tg128	18.27 ± 0.01
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	957.31 ± 2.29
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	92.51 ± 0.05

build: e878c29 (4476)

FP16

Current

model	size	params	backend	ngl	threads	sm	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	pp512	5557.98 ± 18.23
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	tg128	88.98 ± 0.19
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	pp512	694.68 ± 1.90
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	tg128	18.15 ± 0.02
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	5375.49 ± 23.61
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	91.56 ± 0.06

build: a29f087 (4473)

This patch

model	size	params	backend	ngl	threads	sm	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	pp512	5606.51 ± 79.11
llama 8B Q8_0	7.95 GiB	8.03 B	SYCL	99	8	none	tg128	87.32 ± 6.34
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	pp512	707.81 ± 3.43
llama 70B Q4_K - Small	37.57 GiB	70.55 B	SYCL	99	8	none	tg128	18.35 ± 0.06
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	5404.64 ± 16.42
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	92.62 ± 0.07

build: e878c29 (4476)

Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp Signed-off-by: nscipione <[email protected]>

Signed-off-by: nscipione <[email protected]>

s-Nick · 2025-01-15T10:26:07Z

ggml/src/ggml-sycl/dpct/helper.hpp

    {
-        if (scaling_type == library_data_t::real_float &&


complex type doesn't seem to be used anywhere or in any other backend so I removed it to simplify/clean things

s-Nick · 2025-01-15T10:27:56Z

ggml/src/ggml-sycl/ggml-sycl.cpp

@@ -3363,6 +3449,7 @@ static void ggml_sycl_mul_mat_batched_sycl(ggml_backend_sycl_context & ctx,

        ggml_sycl_pool_alloc<const void *> ptrs_src(ctx.pool(), 2*ne23);
        ggml_sycl_pool_alloc<      void *> ptrs_dst(ctx.pool(), 1*ne23);
+        ggml_sycl_pool_alloc<matrix_info_t<float>> matrix_info(ctx.host_pool(),1);


The type passed to matrix_info_t is float because it allows to be casted to the other necessary types to call oneMath functions. Unfortunately it needs to be set here given the current design and include files necessary

Rbiessy

LGTM overall!

ggml/src/ggml-sycl/ggml-sycl.cpp

Alcpz

Hey, thanks for the contribution! I think the host_pool is a good idea, since the use of host_tasks can introduce performance overheads that are not easy to pinpoint.

I left a comment to discuss a bit the design. Let me know your thoughts there.

ggml/src/ggml-sycl/dpct/helper.hpp

qnixsynapse

device(device_) should be initialized after qptr(qptr_) in ggml_sycl_pool_host I think.

/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:1193:9: warning: field 'qptr' will be initialized after field 'device' [-Wreorder-ctor]
 1193 |         qptr(qptr_),
      |         ^~~~~~~~~~~
      |         device(device_)
 1194 |         device(device_) {
      |         ~~~~~~~~~~~~~~~
      |         qptr(qptr_)

Signed-off-by: nscipione <[email protected]>

s-Nick · 2025-01-15T16:49:19Z

Thank you for the comment @qnixsynapse I address it in 6b77639

qnixsynapse · 2025-01-15T17:06:03Z

@s-Nick Sorry for being a bit fussy about compiler warnings. There is one tiny issue left which can be fixed by adding a return nullptr; before the closing brackets of the alloc function which I have put out a comment.

So far everything else LGTM.

Signed-off-by: nscipione <[email protected]>

s-Nick · 2025-01-16T10:24:49Z

@qnixsynapse don't worry, more eyes are always better, thank you for reviewing it. I addressed it in 963b685

qnixsynapse · 2025-01-16T11:41:46Z

@s-Nick Thanks. LGTM!

* Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <[email protected]>

s-Nick added 2 commits January 15, 2025 08:56

Implement host pool for matrix_info

0afea98

Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp Signed-off-by: nscipione <[email protected]>

Remove unnecessary headers and cast

ee11dea

Signed-off-by: nscipione <[email protected]>

s-Nick requested review from Alcpz and Rbiessy January 15, 2025 10:25

s-Nick commented Jan 15, 2025

View reviewed changes

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jan 15, 2025

Rbiessy reviewed Jan 15, 2025

View reviewed changes

ggml/src/ggml-sycl/ggml-sycl.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/ggml-sycl.cpp Outdated Show resolved Hide resolved

Alcpz reviewed Jan 15, 2025

View reviewed changes

ggml/src/ggml-sycl/dpct/helper.hpp Outdated Show resolved Hide resolved

qnixsynapse reviewed Jan 15, 2025

View reviewed changes

s-Nick added 3 commits January 15, 2025 14:47

Reorder member variable to avoid warning on initialization

6b77639

Signed-off-by: nscipione <[email protected]>

Formatting

b0f14c5

Signed-off-by: nscipione <[email protected]>

Remove unused variable

d8956a4

Signed-off-by: nscipione <[email protected]>

Alcpz approved these changes Jan 15, 2025

View reviewed changes

s-Nick marked this pull request as ready for review January 15, 2025 16:55

Address PR review feedback - remove warning

963b685

Signed-off-by: nscipione <[email protected]>

qnixsynapse approved these changes Jan 16, 2025

View reviewed changes

Rbiessy approved these changes Jan 16, 2025

View reviewed changes

s-Nick requested review from airMeng and NeoZhangJianyu January 16, 2025 14:26

abhilash1910 approved these changes Jan 18, 2025

View reviewed changes

airMeng merged commit 99487b5 into ggml-org:master Jan 19, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SYCL: Introducing memory host pool #11251

SYCL: Introducing memory host pool #11251

Uh oh!

s-Nick commented Jan 15, 2025

Uh oh!

s-Nick Jan 15, 2025

Uh oh!

s-Nick Jan 15, 2025

Uh oh!

Rbiessy left a comment

Uh oh!

Uh oh!

Uh oh!

Alcpz left a comment

Uh oh!

Uh oh!

qnixsynapse left a comment •

edited

Loading

Uh oh!

s-Nick commented Jan 15, 2025

Uh oh!

qnixsynapse commented Jan 15, 2025

Uh oh!

s-Nick commented Jan 16, 2025

Uh oh!

qnixsynapse commented Jan 16, 2025

Uh oh!

Uh oh!

Uh oh!

SYCL: Introducing memory host pool #11251

SYCL: Introducing memory host pool #11251

Uh oh!

Conversation

s-Nick commented Jan 15, 2025

FP32

FP16

Uh oh!

s-Nick Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

s-Nick Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Alcpz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qnixsynapse left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s-Nick commented Jan 15, 2025

Uh oh!

qnixsynapse commented Jan 15, 2025

Uh oh!

s-Nick commented Jan 16, 2025

Uh oh!

qnixsynapse commented Jan 16, 2025

Uh oh!

Uh oh!

Uh oh!

qnixsynapse left a comment •

edited

Loading