Skip to content

Commit ec18316

Browse files
committed
breaking change: move/rename source files
1 parent d1c6664 commit ec18316

File tree

7 files changed

+609
-1316
lines changed

7 files changed

+609
-1316
lines changed

Makefile

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@ $(info I CC: $(CCV))
222222
$(info I CXX: $(CXXV))
223223
$(info )
224224

225-
OBJS += mulmat-tune.o
225+
OBJS += ggml-tune.o
226226

227227
#
228228
# Build library
@@ -290,14 +290,14 @@ benchmark-matmult: examples/benchmark/benchmark-matmult.cpp build-info.h ggml.o
290290
vdot: pocs/vdot/vdot.cpp ggml.o $(OBJS)
291291
$(CXX) $(CXXFLAGS) $^ -o $@ $(LDFLAGS)
292292

293-
mulmat-tune.o: examples/mulmat-tune/mulmat-tune.c
293+
ggml-tune.o: ggml-tune.c ggml-tune.h
294294
$(CC) $(CFLAGS) -c $< -o $@
295295

296-
mulmat-tune: examples/mulmat-tune/mulmat-tune-tool.c ggml.o $(OBJS)
296+
mulmat-tune: examples/mulmat-tune/mulmat-tune.c ggml.o $(OBJS)
297297
$(CC) $(CFLAGS) $^ -o mulmat-tune $(LDFLAGS)
298298

299-
test-mulmat-tune: tests/test-mulmat-tune.c ggml.o $(OBJS)
300-
$(CC) $(CFLAGS) $^ -o tests/test-mulmat-tune $(LDFLAGS)
299+
test-ggml-tune: tests/test-ggml-tune.c ggml.o $(OBJS)
300+
$(CC) $(CFLAGS) $^ -o tests/test-ggml-tune $(LDFLAGS)
301301

302302
.PHONY: tests clean
303303
tests:

examples/mulmat-tune/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ void ggml_internal_compute_forward_mul_mat(
145145
struct ggml_tensor * dst);
146146

147147

148-
// examples/mulmat-tune/mulmat-tune.h
148+
// ggml-tune.h
149149

150150
struct ggml_task_stage {
151151
/*enum ggml_backend*/ int backend;
@@ -250,7 +250,7 @@ Terms:
250250
- #1: the `q_f32` BLAS implementation in master (when defined either
251251
`GGML_USE_ACCELERATE` or `GGML_USE_OPENBLAS`)
252252
- #2: split `#1` into `INIT` and `COMPUTE`. Where INIT runs de-quantization
253-
with N threads, COMPUTE in with Accelerate with 1 thread.
253+
with N threads, COMPUTE with BLAS and 1 thread.
254254
255255
The `#0_0` is read as "profile #0, stage 0 (INIT)", the `#0_1` is read as
256256
"profile #0 stage 1 (COMPUTE)". "#0__" is read as total time.
@@ -521,7 +521,7 @@ total_time = init_time / nth + compute_time
521521
For any given M/N/K/n_threads, we can interpolate time for M between the nearest
522522
two `M`s whenever is in bench range or not.
523523
524-
See `ggml_mulmat_tune_estimate_time()` in file [mulmat-tune.c](./mulmat-tune.c)
524+
See `ggml_mulmat_tune_estimate_time()` in file [ggml-tune.c](../../ggmlt-tune.c)
525525
for how to estimate time.
526526
527527
The linear interpolate (t = aM + b) should works well for N/K that are both > 0.
@@ -546,17 +546,17 @@ simple cache is used, the overhead MAY goes down to about 10 us.
546546
547547
## Wait-Notify Overhead
548548
549-
Each call is about 10 us, may vary 5x. Since every mul_mat that run with-gpu
549+
Each call is about 10 us, may vary 5x. Since every mul_mat that run with GPU/BLAS
550550
takes several ms to hundreds of ms, and the average boost is large, so the
551551
wait-notify overhead is acceptable.
552552
553553
## High Level Guide to Code Review
554554
555555
**Major Changes**
556556
557-
- examples/mulmat-tune provides the tool, data file format and data
558-
structure/APIs for graph compute. Some of them are expected be integrated into
559-
ggml.c/ggml.h.
557+
- examples/mulmat-tune provides the bench tools
558+
- ggml-tune.{c,h}: data file format and data structure/APIs for graph compute.
559+
Some of them are expected be integrated into ggml.c/ggml.h.
560560
- ggml.h: exposes a test function for mulmat-tune-bench.c; new fields and structs.
561561
- ggml.c: new threading framework, update to `ggml_compute_forward_mul_mat()`.
562562
updated BLAS codes for the new task config/profile; split COMPUTE into INIT +
@@ -571,7 +571,7 @@ I assume we agree that:
571571
572572
1. Discuss and evaluate, determine whether this pull request make sense.
573573
2. Fix and enhance, and rebase onto latest master.
574-
3. If it useful and being accepted, then split in to smaller pull requests.
574+
3. If it's useful and being accepted, then split in to smaller pull requests.
575575
576576
Here is the possible merge steps I think:
577577

0 commit comments

Comments
 (0)