[GGML][RPC] Support for models with non-512-aligned tensors over RPC. #11047

matt23654 · 2025-01-02T17:31:36Z

This PR adds support for quantized tensors with sizes not divisible by 512, such as those found in quantized versions of the Qwen2.5 72B model. Please also see discussion in #10943.

Changes

Implemented forwarding to CUDA backend on server for init_tensor and get_alloc_size calls
Forwarding all calls would result in unacceptable latency (tested and tok/s drops to 0.02). At the moment only calls for misaligned tensors are forwarded. There may be a better way of handling this in the future.

Performance Impact

Qwen2.5 72B Q4_K_M: Coherent at ~4 tokens/s over GbE with mix of GPU/RPC/CPU. (previously outputted garbage)
Existing models (e.g. LLaMA 3.3 70B Q4_K_M): Unaffected (~7 tokens/s over GbE with GPU/RPC)

Testing

Perplexity validation with Tulu-3 8B:
- Without RPC: 6.9362 ± 0.04526
- With RPC: 6.9362 ± 0.04526 (identical results)
test-backend-ops: Passed for supported backends
Note: Unable to run full CI due to bandwidth limitations with model downloads

rgerganov

looks fine to me, some minor comments inline

ggml/src/ggml-rpc/ggml-rpc.cpp

Co-authored-by: Diego Devesa <[email protected]>

…ggml-org#11047) * Added init tensor calling code * Added get_alloc_size forwarding * Cleaned up and improved type/error handling. * fix: remove trailing whitespaces. * Cleanup and use GGML error logging functions. * Handle potentially dangerous edge cases. * Apply suggestions from code review Co-authored-by: Diego Devesa <[email protected]> --------- Co-authored-by: Diego Devesa <[email protected]>

matt23654 added 3 commits December 31, 2024 21:56

Added init tensor calling code

7aad6cb

Added get_alloc_size forwarding

c47dc70

Cleaned up and improved type/error handling.

1948ae8

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jan 2, 2025

fix: remove trailing whitespaces.

840594f

rgerganov approved these changes Jan 3, 2025

View reviewed changes

ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved

Cleanup and use GGML error logging functions.

b66e91b

slaren reviewed Jan 3, 2025

View reviewed changes

ggml/src/ggml-rpc/ggml-rpc.cpp Show resolved Hide resolved

Handle potentially dangerous edge cases.

c111e8a

slaren approved these changes Jan 3, 2025

View reviewed changes

ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved

Apply suggestions from code review

4973a29

Co-authored-by: Diego Devesa <[email protected]>

slaren merged commit f922a9c into ggml-org:master Jan 4, 2025
48 checks passed

thxCode mentioned this pull request Jan 23, 2025

Failed to distribute offload the qwen2-vl model on macOS gpustack/gpustack#1057

Closed

saood06 mentioned this pull request Feb 8, 2025

RPC sync ikawrakow/ik_llama.cpp#193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GGML][RPC] Support for models with non-512-aligned tensors over RPC. #11047

[GGML][RPC] Support for models with non-512-aligned tensors over RPC. #11047

Uh oh!

matt23654 commented Jan 2, 2025

Uh oh!

rgerganov left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[GGML][RPC] Support for models with non-512-aligned tensors over RPC. #11047

[GGML][RPC] Support for models with non-512-aligned tensors over RPC. #11047

Uh oh!

Conversation

matt23654 commented Jan 2, 2025

Changes

Performance Impact

Testing

Uh oh!

rgerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!