-
Notifications
You must be signed in to change notification settings - Fork 12.2k
Fine tune MUL_MAT, new threading (spin+wait/notify), speedup q_f32 BLAS by splitting COMPUTE stage #1632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Fine tune MUL_MAT, new threading (spin+wait/notify), speedup q_f32 BLAS by splitting COMPUTE stage #1632
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
213f133
initial
mqy 1b041d7
threading test: improve readability at both codes and output
mqy 48016f6
bulk refactored task profile to support complete fallback; enable tun…
mqy 9106232
threading test: At github, Windows can take more than 20 seconds to s…
mqy bb590f1
Workrounnd to set node->backend
mqy 7c05049
tunning: check GPU offloading before loading model
mqy 21e9379
tunning: add f16, todo: f32 failed with CL
mqy 5342dc0
tunning: support k_quants; disabled rope shapes (workaround); make ca…
mqy 6b83a3e
try make CL run w/o tunning, but -ngl stucks no output. had to add ta…
mqy 06b0082
bulk refactoring task profile and related to run CL GPU offloading.
mqy 67bb367
typos
mqy 2193ab6
fix cuda build error
mqy 0ec4dab
fixed break and asssion from select; try fix cuda link error
mqy 5abb8ae
fix warning
mqy 5feefb3
threading: add suspend/resume APIs, so it's possible to run a thread …
mqy 286c5b3
threadng: remove unnecessary spin lock/unlock from suspend/resume; ad…
mqy 9872863
threading test: less loops to avoid timeout
mqy 6609c22
fixed OP_OUT_PROD and OP_NONE
mqy 65fd65e
tune: update readme
mqy 44b831d
tune: extract ggml_mulmat_tune_bench_wrapper
mqy 4d32b40
threading test: decrease a threshold value to avoid timeout
mqy cc8a375
threading: fix deadlock by reverting part of changes from commit 286c…
mqy aac7f7c
threading: try to fix a deadlock, also added critical deadlock detection
mqy 08972d2
threading: removed feature wait_on_done to figure out causes of deadl…
mqy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,6 +40,7 @@ models/* | |
/server | ||
/Pipfile | ||
/libllama.so | ||
/mulmat-tune | ||
|
||
build-info.h | ||
arm_neon.h | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# Define the default target now so that it is always the first target | ||
BUILD_TARGETS = main quantize quantize-stats perplexity embedding vdot train-text-from-scratch simple | ||
BUILD_TARGETS = main quantize quantize-stats perplexity embedding vdot train-text-from-scratch simple mulmat-tune | ||
|
||
ifdef LLAMA_BUILD_SERVER | ||
BUILD_TARGETS += server | ||
|
@@ -47,7 +47,8 @@ endif | |
OPT = -O3 | ||
CFLAGS = -I. $(OPT) -std=c11 -fPIC | ||
CXXFLAGS = -I. -I./examples $(OPT) -std=c++11 -fPIC | ||
LDFLAGS = | ||
# -lm fixed error: ggml.o: undefined reference to symbol 'tanhf@@GLIBC_2.2.5' from ubuntu 22.04 | ||
LDFLAGS = -lm | ||
|
||
ifdef LLAMA_DEBUG | ||
CFLAGS += -O0 -g | ||
|
@@ -134,8 +135,7 @@ ifndef LLAMA_NO_K_QUANTS | |
endif | ||
|
||
ifndef LLAMA_NO_ACCELERATE | ||
# Mac M1 - include Accelerate framework. | ||
# `-framework Accelerate` works on Mac Intel as well, with negliable performance boost (as of the predict time). | ||
# Mac Intel & M1 - include Accelerate framework. | ||
ifeq ($(UNAME_S),Darwin) | ||
CFLAGS += -DGGML_USE_ACCELERATE | ||
LDFLAGS += -framework Accelerate | ||
|
@@ -145,10 +145,16 @@ endif # LLAMA_NO_ACCELERATE | |
ifdef LLAMA_OPENBLAS | ||
CFLAGS += -DGGML_USE_OPENBLAS -I/usr/local/include/openblas -I/usr/include/openblas | ||
LDFLAGS += -lopenblas | ||
ifeq ($(UNAME_S),Darwin) | ||
# openblas installed with Homebew on macOS. | ||
CFLAGS += -I/usr/local/opt/openblas/include | ||
LDFLAGS += -L/usr/local/opt/openblas/lib | ||
endif | ||
endif # LLAMA_OPENBLAS | ||
|
||
ifdef LLAMA_BLIS | ||
CFLAGS += -DGGML_USE_OPENBLAS -I/usr/local/include/blis -I/usr/include/blis | ||
CFLAGS += -DGGML_BLAS_VENDOR="\"BLIS\"" | ||
LDFLAGS += -lblis -L/usr/local/lib | ||
endif # LLAMA_BLIS | ||
|
||
|
@@ -225,11 +231,16 @@ ifneq ($(filter armv8%,$(UNAME_M)),) | |
CFLAGS += -mfp16-format=ieee -mno-unaligned-access | ||
endif | ||
|
||
ifdef LLAMA_NO_K_QUANTS | ||
ifndef LLAMA_NO_K_QUANTS | ||
k_quants.o: k_quants.c k_quants.h | ||
$(CC) $(CFLAGS) -c $< -o $@ | ||
endif # LLAMA_NO_K_QUANTS | ||
|
||
ifndef LLAMA_NO_TUNE | ||
CFLAGS += -DGGML_USE_TUNE #-DGGML_TUNE_NDEBUG | ||
CXXFLAGS += -DGGML_USE_TUNE | ||
endif | ||
|
||
# | ||
# Print build information | ||
# | ||
|
@@ -245,6 +256,8 @@ $(info I CC: $(CCV)) | |
$(info I CXX: $(CXXV)) | ||
$(info ) | ||
|
||
OBJS += ggml-tune.o ggml-threading.o | ||
|
||
# | ||
# Build library | ||
# | ||
|
@@ -253,7 +266,12 @@ ggml.o: ggml.c ggml.h ggml-cuda.h | |
$(CC) $(CFLAGS) -c $< -o $@ | ||
|
||
llama.o: llama.cpp ggml.h ggml-cuda.h ggml-metal.h llama.h llama-util.h | ||
$(CXX) $(CXXFLAGS) -c $< -o $@ | ||
|
||
ggml-threading.o: ggml-threading.c ggml.h | ||
$(CC) $(CFLAGS) -c $< -o $@ | ||
|
||
ggml-tune.o: ggml-tune.c ggml.h | ||
$(CC) $(CFLAGS) -c $< -o $@ | ||
|
||
common.o: examples/common.cpp examples/common.h | ||
$(CXX) $(CXXFLAGS) -c $< -o $@ | ||
|
@@ -298,6 +316,9 @@ server: examples/server/server.cpp examples/server/httplib.h examples/server/jso | |
train-text-from-scratch: examples/train-text-from-scratch/train-text-from-scratch.cpp build-info.h ggml.o llama.o $(OBJS) | ||
$(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) | ||
|
||
mulmat-tune: examples/mulmat-tune/mulmat-tune.cpp build-info.h ggml.o $(OBJS) | ||
$(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o mulmat-tune $(LDFLAGS) | ||
|
||
build-info.h: $(wildcard .git/index) scripts/build-info.sh | ||
@sh scripts/build-info.sh > [email protected] | ||
@if ! cmp -s [email protected] $@; then \ | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
set(TARGET mulmat-tune) | ||
add_executable(${TARGET} mulmat-tune.cpp) | ||
|
||
if (XCODE OR MSVC) | ||
set(MULMAT_TUNE_LIBS ggml) | ||
else() | ||
set(MULMAT_TUNE_LIBS ggml m) | ||
endif() | ||
|
||
target_link_libraries(${TARGET} PRIVATE ${MULMAT_TUNE_LIBS} ${CMAKE_THREAD_LIBS_INIT}) | ||
target_compile_features(${TARGET} PRIVATE cxx_std_11) | ||
if(TARGET BUILD_INFO) | ||
add_dependencies(${TARGET} BUILD_INFO) | ||
endif() |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.