Skip to content

Commit f215844

Browse files
committed
gh-xxxx: overhaul build rules for optimized binaries
This commit overhauls the make-based build system's rules for building optimized binaries. Along the way it fixes a myriad of bugs and shortcomings with the prior approach. The old way of producing optimized binaries had various limitations: * `make [all]` would do work when PGO was enabled because the phony `profile-opt` rule was non-empty. This prevented no-op PGO builds from working at all. This meant workflows like `make; make install` either incurred extra work or failed due to race conditions. * Same thing for BOLT, as its `bolt-opt` rule was also non-empty and always ran during `make [all]`. * BOLT could not be run multiple times without a full rebuild because `llvm-bolt` can't instrument binaries that have already received BOLT optimizations. * It was difficult to run BOLT on its own because of how various make targets and their dependencies were structured. * I found the old way that configure and make communicated the default targets to be confusing and hard to understand. There are essentially 2 major changes going on in this commit: 1. A rework of the high-level make targets for performing a build and how they are defined. 2. A rework of all the make logic related to profile-based optimization (read: PGO and BOLT). Build Target Rework ====================== Before, we essentially had `build_all`, `profile-opt`, `bolt-opt` and `build_wasm` as our 3 targets for performing a build. `all` would alias to one of these, as appropriate. And there was another definition for which _simple_ make target to evaluate for non-optimized builds. This was likely `build_all` or `all`. In the rework, we introduce 2 new high-level targets: * `build-plain` - Perform a build without optimizations. * `build-optimized` - Perform a build with optimizations. `build-plain` is aliased to `build_all` in all configurations except WASM, where it is `build_wasm`. `build-optimized` by default is aliased to a target that prints an error message when optimizations aren't enabled. If PGO or BOLT are enabled, it is aliased to their respective target. `build-optimized` is the logical successor to `profile-opt`. I felt it best to delete `profile-opt` completely, as the new `build-*` high-level targets feel more friendly to use. But if people lament its loss, we can add a `profile-opt: build-optimized` to achieve almost the same result. Profiled-Based Optimization Rework ================================== Most of the make logic related to profile-based optimization (read: PGO and BOLT) has been touched in this change. A major issue with the old way of doing things was we used phony, always-executed make rules. This is a bad practice in make because it undermines no-op builds. Another issue is that the separation between the rules and what order they ran in wasn't always clear. Both PGO and BOLT consist of the same 4 phase solution: instrument, run, analyze, and apply. However, these steps weren't clearly expressed in the make logic. This is especially true for BOLT, which only had 1 make rule. Another issue with BOLT is that it was really easy to get things into a bad state. e.g. if you applied BOLT to `pythonX.Y` you could not run BOLT again unless you rebuilt `pythonX.Y` from source. In the new world, we have separate `profile-<tool>-<stage>-stamp` rules defining the 4 distinct `instrument`, `run`, `analyze`, and `apply` stages for both PGO and BOLT. Each of these stages is tracked by a _stamp_ semaphore file so progress can be captured. This should all be pretty straightforward. There is some minimal complexity here to handle BOLT's optional dependency on PGO, as BOLT either depends on `build_all` or `profile-pgo-apply-stamp`. As part of the refactor to BOLT we also preserve the original input binary before BOLT is applied. This original file is restored if BOLT runs again. This greatly simplifies repeated BOLT invocations, as make doesn't perform needless work. However, this is all best effort, as it is possible for some make target evaluations to still get things in a bad state. Other Remarks ============= If this change perturbs any bugs, they are likely around cleaning behavior. The cleaning rules are a bit complicated and not clearly documented. And I'm unsure which targets CPython developers often iterate on. It is highly possible that state cleanup of PGO and/or BOLT files isn't as robust as it needs to be. I explicitly deleted some calls to PGO cleanup because those calls prevented no-op `make [all]` from working. It is certainly possible something somewhere (release automation?) relied on these files being deleted when they no longer are. We still have targets to purge profile files and it should be trivial to add these to appropriate make rules.
1 parent 8eb0b50 commit f215844

File tree

4 files changed

+204
-95
lines changed

4 files changed

+204
-95
lines changed

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -122,8 +122,7 @@ Tools/unicode/data/
122122
# hendrikmuhs/ccache-action@v1
123123
/.ccache
124124
/platform
125-
/profile-clean-stamp
126-
/profile-run-stamp
125+
/profile-*-stamp
127126
/Python/deepfreeze/*.c
128127
/pybuilddir.txt
129128
/pyconfig.h

Makefile.pre.in

Lines changed: 127 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -601,13 +601,27 @@ LIBHACL_SHA2_HEADERS= \
601601
#########################################################################
602602
# Rules
603603

604-
# Default target
605-
all: @DEF_MAKE_ALL_RULE@
604+
# Default target.
605+
# Likely either `build-plain` or `build-optimized`.
606+
all: @MAKE_TARGET_ALL@
606607

607608
# First target in Makefile is implicit default. So .PHONY needs to come after
608609
# all.
609610
.PHONY: all
610611

612+
# Build without any optimizations or instrumented binaries.
613+
.PHONY: build-plain
614+
build-plain: @MAKE_TARGET_BUILD_PLAIN@
615+
616+
# Build with optimizations (PGO, BOLT, etc).
617+
.PHONY: build-optimized
618+
build-optimized: @MAKE_TARGET_BUILD_OPTIMIZED@
619+
620+
.PHONY: build-optimized-not-enabled
621+
build-optimized-not-enabled:
622+
@echo "build-optimized requires --enable-optimizations in configure; aborting"
623+
@exit 1
624+
611625
.PHONY: build_all
612626
build_all: check-clean-src $(BUILDPYTHON) platform sharedmods \
613627
gdbhooks Programs/_testembed scripts checksharedmods rundsymutil
@@ -629,69 +643,145 @@ check-clean-src:
629643
exit 1; \
630644
fi
631645

632-
# Profile generation build must start from a clean tree.
646+
# Profile-based optimization.
647+
#
648+
# PGO and BOLT profile-based optimization is supported. For each optimization,
649+
# roughly the following steps are done:
650+
#
651+
# 1. "Instrument" binaries with run-time data collection (e.g. build or modify
652+
# a variant of the binary.)
653+
# 2. "Run" instrumented binaries (via subset of test suite) to collect data.
654+
# 3. "Analyze" / collect / merge data files from previous step.
655+
# 4. "Apply" collected data from above. (e.g. rebuild or modify a binary).
656+
#
657+
# 0, 1, or multiple profile based optimizations can be enabled.
658+
#
659+
# We track the progress of profile-based optimization using various "stamp"
660+
# files. An empty stamp file tracks the stage of optimization we're in.
661+
# Each *-stamp rule that follows is defined in execution / dependency order.
662+
663+
# Remove files produced by or used for tracking profile-guided optimization.
664+
.PHONY: profile-remove
665+
profile-remove: clean-bolt
666+
find . -name '*.gc??' -exec rm -f {} ';'
667+
find . -name '*.profclang?' -exec rm -f {} ';'
668+
find . -name '*.dyn' -exec rm -f {} ';'
669+
rm -f $(COVERAGE_INFO)
670+
rm -rf $(COVERAGE_REPORT)
671+
# Remove all progress tracking stamps to ensure a clean slate.
672+
rm -f profile-*-stamp
673+
674+
# Profile-based optimization requires a fresh build environment.
633675
profile-clean-stamp:
634-
$(MAKE) clean
676+
$(MAKE) clean profile-remove
635677
touch $@
636678

637-
# Compile with profile generation enabled.
638-
profile-gen-stamp: profile-clean-stamp
679+
# Build with PGO instrumentation enabled.
680+
profile-pgo-instrument-stamp: profile-clean-stamp
639681
@if [ $(LLVM_PROF_ERR) = yes ]; then \
640682
echo "Error: Cannot perform PGO build because llvm-profdata was not found in PATH" ;\
641683
echo "Please add it to PATH and run ./configure again" ;\
642684
exit 1;\
643685
fi
644686
@echo "Building with support for profile generation:"
645-
$(MAKE) @DEF_MAKE_RULE@ CFLAGS_NODIST="$(CFLAGS_NODIST) $(PGO_PROF_GEN_FLAG)" LDFLAGS_NODIST="$(LDFLAGS_NODIST) $(PGO_PROF_GEN_FLAG)" LIBS="$(LIBS)"
687+
$(MAKE) @MAKE_TARGET_BUILD_PLAIN@ CFLAGS_NODIST="$(CFLAGS_NODIST) $(PGO_PROF_GEN_FLAG)" LDFLAGS_NODIST="$(LDFLAGS_NODIST) $(PGO_PROF_GEN_FLAG)" LIBS="$(LIBS)"
646688
touch $@
647689

648-
# Run task with profile generation build to create profile information.
649-
profile-run-stamp:
690+
# Run PGO instrumented binaries and collect profile data.
691+
profile-pgo-run-stamp: profile-pgo-instrument-stamp
650692
@echo "Running code to generate profile data (this can take a while):"
651-
# First, we need to create a clean build with profile generation
652-
# enabled.
653-
$(MAKE) profile-gen-stamp
654-
# Next, run the profile task to generate the profile information.
655693
@ # FIXME: can't run for a cross build
656694
$(LLVM_PROF_FILE) $(RUNSHARED) ./$(BUILDPYTHON) $(PROFILE_TASK) || true
695+
touch $@
696+
697+
# Collect data files produced by running PGO instrumented binaries.
698+
profile-pgo-analyze-stamp: profile-pgo-run-stamp
657699
$(LLVM_PROF_MERGER)
658700
# Remove profile generation binary since we are done with it.
659701
$(MAKE) clean-retain-profile
660-
# This is an expensive target to build and it does not have proper
661-
# makefile dependency information. So, we create a "stamp" file
662-
# to record its completion and avoid re-running it.
663702
touch $@
664703

665-
# Compile Python binary with profile guided optimization.
666-
# To force re-running of the profile task, remove the profile-run-stamp file.
667-
.PHONY: profile-opt
668-
profile-opt: profile-run-stamp
704+
# Use collected PGO data to influence rebuild of binaries.
705+
profile-pgo-apply-stamp: profile-pgo-analyze-stamp
669706
@echo "Rebuilding with profile guided optimizations:"
670-
-rm -f profile-clean-stamp
671-
$(MAKE) @DEF_MAKE_RULE@ CFLAGS_NODIST="$(CFLAGS_NODIST) $(PGO_PROF_USE_FLAG)" LDFLAGS_NODIST="$(LDFLAGS_NODIST)"
672-
673-
.PHONY: bolt-opt
674-
bolt-opt: @PREBOLT_RULE@
675-
rm -f *.fdata
676-
@if $(READELF) -p .note.bolt_info $(BUILDPYTHON) | grep BOLT > /dev/null; then\
677-
echo "skip: $(BUILDPYTHON) is already BOLTed."; \
678-
else \
679-
@LLVM_BOLT@ ./$(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst; \
680-
./$(BUILDPYTHON).bolt_inst $(PROFILE_TASK) || true; \
681-
@MERGE_FDATA@ $(BUILDPYTHON).*.fdata > $(BUILDPYTHON).fdata; \
682-
@LLVM_BOLT@ ./$(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot; \
683-
rm -f *.fdata; \
684-
rm -f $(BUILDPYTHON).bolt_inst; \
685-
mv $(BUILDPYTHON).bolt $(BUILDPYTHON); \
707+
# Need to purge PGO instrumented build to force a rebuild.
708+
$(MAKE) clean-retain-profile
709+
$(MAKE) @MAKE_TARGET_BUILD_PLAIN@ CFLAGS_NODIST="$(CFLAGS_NODIST) $(PGO_PROF_USE_FLAG)" LDFLAGS_NODIST="$(LDFLAGS_NODIST)"
710+
touch $@
711+
712+
# BOLT supports instrumenting and applying changes to standalone binaries
713+
# without having to recompile.
714+
#
715+
# BOLT can run independently or in addition to PGO. If running with PGO,
716+
# it always runs after PGO. Care needs to be taken to preserve PGO state
717+
# when running BOLT so make doesn't re-apply PGO.
718+
#
719+
# BOLT also can't instrument binaries that have already had BOLT applied
720+
# to them. So we make an attempt to preserve and re-use the pristine
721+
# pre-BOLT binaries so developers can iterate on just BOLT optimization
722+
# passes.
723+
724+
# List of binaries that BOLT runs on.
725+
BOLT_BINARIES = $(BUILDPYTHON)
726+
727+
# Remove traces of bolt.
728+
.PHONY: clean-bolt
729+
clean-bolt:
730+
# Instrumented binaries.
731+
find . -name '*.bolt_inst' -exec rm -f {} ';'
732+
# The data files they produce.
733+
find . -name '*.fdata' -exec rm -f {} ';'
734+
# Copied of binaries before BOLT application.
735+
find . -name '*.prebolt' -exec rm -f {} ';'
736+
737+
# BOLTs dependencies are a bit wonky.
738+
#
739+
# If PGO is enabled, we can take a native rule dependency on a stamp file.
740+
# If PGO isn't enabled, we don't have a stamp to key off of and the phony
741+
# target (e.g. build_all) will always force rebuilds. So we call out to
742+
# make externally to sidestep the dependency.
743+
#
744+
# We can simplify this hack if we ever get stamp files for plain builds.
745+
profile-bolt-prebuild-stamp: @MAKE_BOLT_NATIVE_DEPENDENCY@
746+
if [ -n "@MAKE_BOLT_MAKE_DEPENDENCY@" ]; then \
747+
$(MAKE) @MAKE_BOLT_MAKE_DEPENDENCY@; \
686748
fi
749+
touch $@
687750

751+
profile-bolt-instrument-stamp: profile-bolt-prebuild-stamp
752+
for bin in $(BOLT_BINARIES); do \
753+
if [ -e "$${bin}.prebolt" ]; then \
754+
echo "Restoring pre-BOLT binary $${bin}.prebolt"; \
755+
mv "$${bin}.prebolt" "$${bin}"; \
756+
fi \
757+
done
758+
# Ensure prior BOLT state is purged.
759+
$(MAKE) clean-bolt
760+
@LLVM_BOLT@ ./$(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst
761+
touch $@
762+
763+
profile-bolt-run-stamp: profile-bolt-instrument-stamp
764+
./$(BUILDPYTHON).bolt_inst $(PROFILE_TASK) || true
765+
touch $@
766+
767+
profile-bolt-analyze-stamp: profile-bolt-run-stamp
768+
@MERGE_FDATA@ $(BUILDPYTHON).*.fdata > $(BUILDPYTHON).fdata
769+
touch $@
770+
771+
profile-bolt-apply-stamp: profile-bolt-analyze-stamp
772+
@LLVM_BOLT@ ./$(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions=3 -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=all -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
773+
mv $(BUILDPYTHON) $(BUILDPYTHON).prebolt
774+
mv $(BUILDPYTHON).bolt $(BUILDPYTHON)
775+
touch $@
776+
777+
# End of profile-based optimization rules.
688778

689779
# Compile and run with gcov
690780
.PHONY: coverage
691781
coverage:
692782
@echo "Building with support for coverage checking:"
693783
$(MAKE) clean
694-
$(MAKE) @DEF_MAKE_RULE@ CFLAGS="$(CFLAGS) -O0 -pg --coverage" LDFLAGS="$(LDFLAGS) --coverage"
784+
$(MAKE) @MAKE_SIMPLE_BUILD_TARGET@ CFLAGS="$(CFLAGS) -O0 -pg --coverage" LDFLAGS="$(LDFLAGS) --coverage"
695785

696786
.PHONY: coverage-lcov
697787
coverage-lcov:
@@ -2610,23 +2700,9 @@ clean-retain-profile: pycremoval
26102700
-rm -f Python/frozen_modules/MANIFEST
26112701
-find build -type f -a ! -name '*.gc??' -exec rm -f {} ';'
26122702
-rm -f Include/pydtrace_probes.h
2613-
-rm -f profile-gen-stamp
2614-
2615-
.PHONY: profile-removal
2616-
profile-removal:
2617-
find . -name '*.gc??' -exec rm -f {} ';'
2618-
find . -name '*.profclang?' -exec rm -f {} ';'
2619-
find . -name '*.dyn' -exec rm -f {} ';'
2620-
rm -f $(COVERAGE_INFO)
2621-
rm -rf $(COVERAGE_REPORT)
2622-
rm -f profile-run-stamp
26232703

26242704
.PHONY: clean
26252705
clean: clean-retain-profile
2626-
@if test @DEF_MAKE_ALL_RULE@ = profile-opt; then \
2627-
rm -f profile-gen-stamp profile-clean-stamp; \
2628-
$(MAKE) profile-removal; \
2629-
fi
26302706

26312707
.PHONY: clobber
26322708
clobber: clean

configure

Lines changed: 39 additions & 21 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)