@@ -145,7 +145,7 @@ void ggml_internal_compute_forward_mul_mat(
145
145
struct ggml_tensor * dst);
146
146
147
147
148
- // examples/mulmat-tune/mulmat -tune.h
148
+ // ggml -tune.h
149
149
150
150
struct ggml_task_stage {
151
151
/* enum ggml_backend* / int backend;
@@ -250,7 +250,7 @@ Terms:
250
250
- #1: the `q_f32` BLAS implementation in master (when defined either
251
251
`GGML_USE_ACCELERATE` or `GGML_USE_OPENBLAS`)
252
252
- #2: split `#1` into `INIT` and `COMPUTE`. Where INIT runs de-quantization
253
- with N threads, COMPUTE in with Accelerate with 1 thread.
253
+ with N threads, COMPUTE with BLAS and 1 thread.
254
254
255
255
The `#0_0` is read as "profile #0, stage 0 (INIT)", the `#0_1` is read as
256
256
"profile #0 stage 1 (COMPUTE)". "#0__" is read as total time.
@@ -521,7 +521,7 @@ total_time = init_time / nth + compute_time
521
521
For any given M/N/K/n_threads, we can interpolate time for M between the nearest
522
522
two `M`s whenever is in bench range or not.
523
523
524
- See `ggml_mulmat_tune_estimate_time()` in file [mulmat -tune.c](./mulmat -tune.c)
524
+ See `ggml_mulmat_tune_estimate_time()` in file [ggml -tune.c](../../ggmlt -tune.c)
525
525
for how to estimate time.
526
526
527
527
The linear interpolate (t = aM + b) should works well for N/K that are both > 0.
@@ -546,17 +546,17 @@ simple cache is used, the overhead MAY goes down to about 10 us.
546
546
547
547
## Wait-Notify Overhead
548
548
549
- Each call is about 10 us, may vary 5x. Since every mul_mat that run with-gpu
549
+ Each call is about 10 us, may vary 5x. Since every mul_mat that run with GPU/BLAS
550
550
takes several ms to hundreds of ms, and the average boost is large, so the
551
551
wait-notify overhead is acceptable.
552
552
553
553
## High Level Guide to Code Review
554
554
555
555
**Major Changes**
556
556
557
- - examples/mulmat-tune provides the tool, data file format and data
558
- structure/APIs for graph compute. Some of them are expected be integrated into
559
- ggml.c/ggml.h.
557
+ - examples/mulmat-tune provides the bench tools
558
+ - ggml-tune.{c,h}: data file format and data structure/APIs for graph compute.
559
+ Some of them are expected be integrated into ggml.c/ggml.h.
560
560
- ggml.h: exposes a test function for mulmat-tune-bench.c; new fields and structs.
561
561
- ggml.c: new threading framework, update to `ggml_compute_forward_mul_mat()`.
562
562
updated BLAS codes for the new task config/profile; split COMPUTE into INIT +
@@ -571,7 +571,7 @@ I assume we agree that:
571
571
572
572
1. Discuss and evaluate, determine whether this pull request make sense.
573
573
2. Fix and enhance, and rebase onto latest master.
574
- 3. If it useful and being accepted, then split in to smaller pull requests.
574
+ 3. If it's useful and being accepted, then split in to smaller pull requests.
575
575
576
576
Here is the possible merge steps I think:
577
577
0 commit comments