Skip to content

[HIP] add --offload-compression-level= option #83605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 9, 2024

Conversation

yxsamliu
Copy link
Collaborator

@yxsamliu yxsamliu commented Mar 1, 2024

Added --offload-compression-level= option to clang and -compression-level=
option to clang-offload-bundler for controlling compression level.

Added support of long distance matching (LDM) for llvm::zstd which is off
by default. Enable it for clang-offload-bundler by default since it
improves compression rate in general.

Change default compression level to 3 for zstd for clang-offload-bundler
since it works well for bundle entry size from 1KB to 32MB, which should
cover most of the clang-offload-bundler usage. Users can still specify
compression level by -compression-level= option if necessary.

@yxsamliu yxsamliu requested review from MaskRay, Artem-B and jhuber6 March 1, 2024 18:30
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' llvm:support labels Mar 1, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 1, 2024

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

Change compression level to 20 for zstd better
compression rate.


Full diff: https://github.com/llvm/llvm-project/pull/83605.diff

4 Files Affected:

  • (modified) clang/lib/Driver/OffloadBundler.cpp (+12-4)
  • (modified) clang/test/Driver/clang-offload-bundler-zlib.c (+3-2)
  • (modified) clang/test/Driver/clang-offload-bundler-zstd.c (+2-1)
  • (modified) llvm/include/llvm/Support/Compression.h (+2)
diff --git a/clang/lib/Driver/OffloadBundler.cpp b/clang/lib/Driver/OffloadBundler.cpp
index 99a34d25cfcd56..ef48b2f33d3783 100644
--- a/clang/lib/Driver/OffloadBundler.cpp
+++ b/clang/lib/Driver/OffloadBundler.cpp
@@ -942,12 +942,19 @@ CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
       Input.getBuffer().size());
 
   llvm::compression::Format CompressionFormat;
+  int Level;
 
-  if (llvm::compression::zstd::isAvailable())
+  if (llvm::compression::zstd::isAvailable()) {
     CompressionFormat = llvm::compression::Format::Zstd;
-  else if (llvm::compression::zlib::isAvailable())
+    // Use a high zstd compress level by default for better size reduction.
+    const int DefaultZstdLevel = 20;
+    Level = DefaultZstdLevel;
+  } else if (llvm::compression::zlib::isAvailable()) {
     CompressionFormat = llvm::compression::Format::Zlib;
-  else
+    // Use default level for zlib since higher level does not have significant
+    // improvement.
+    Level = llvm::compression::zlib::DefaultCompression;
+  } else
     return createStringError(llvm::inconvertibleErrorCode(),
                              "Compression not supported");
 
@@ -955,7 +962,7 @@ CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
                             ClangOffloadBundlerTimerGroup);
   if (Verbose)
     CompressTimer.startTimer();
-  llvm::compression::compress(CompressionFormat, BufferUint8, CompressedBuffer);
+  llvm::compression::compress({CompressionFormat, Level}, BufferUint8, CompressedBuffer);
   if (Verbose)
     CompressTimer.stopTimer();
 
@@ -980,6 +987,7 @@ CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
         CompressionFormat == llvm::compression::Format::Zstd ? "zstd" : "zlib";
     llvm::errs() << "Compressed bundle format version: " << Version << "\n"
                  << "Compression method used: " << MethodUsed << "\n"
+                 << "Compression level: " << Level << "\n"
                  << "Binary size before compression: " << UncompressedSize
                  << " bytes\n"
                  << "Binary size after compression: " << CompressedBuffer.size()
diff --git a/clang/test/Driver/clang-offload-bundler-zlib.c b/clang/test/Driver/clang-offload-bundler-zlib.c
index a57ee6da9a86a6..aed3a03d5341b5 100644
--- a/clang/test/Driver/clang-offload-bundler-zlib.c
+++ b/clang/test/Driver/clang-offload-bundler-zlib.c
@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib && !zstd
 // REQUIRES: x86-registered-target
 // UNSUPPORTED: target={{.*}}-darwin{{.*}}, target={{.*}}-aix{{.*}}
 
@@ -34,7 +34,8 @@
 // RUN: diff %t.tgt2 %t.res.tgt2
 
 //
-// COMPRESS: Compression method used:
+// COMPRESS: Compression method used: zlib
+// COMPRESS: Compression level: 9
 // DECOMPRESS: Decompression method:
 // NOHOST-NOT: host-
 // NOHOST-DAG: hip-amdgcn-amd-amdhsa--gfx900
diff --git a/clang/test/Driver/clang-offload-bundler-zstd.c b/clang/test/Driver/clang-offload-bundler-zstd.c
index 3b577d4d166a3f..200634e05fa145 100644
--- a/clang/test/Driver/clang-offload-bundler-zstd.c
+++ b/clang/test/Driver/clang-offload-bundler-zstd.c
@@ -31,7 +31,8 @@
 // RUN: diff %t.tgt1 %t.res.tgt1
 // RUN: diff %t.tgt2 %t.res.tgt2
 //
-// COMPRESS: Compression method used
+// COMPRESS: Compression method used: zstd
+// COMPRESS: Compression level: 20
 // DECOMPRESS: Decompression method
 // NOHOST-NOT: host-
 // NOHOST-DAG: hip-amdgcn-amd-amdhsa--gfx900
diff --git a/llvm/include/llvm/Support/Compression.h b/llvm/include/llvm/Support/Compression.h
index c3ba3274d6ed87..9c9c3e2cc4a6c0 100644
--- a/llvm/include/llvm/Support/Compression.h
+++ b/llvm/include/llvm/Support/Compression.h
@@ -94,6 +94,8 @@ struct Params {
   constexpr Params(Format F)
       : format(F), level(F == Format::Zlib ? zlib::DefaultCompression
                                            : zstd::DefaultCompression) {}
+  constexpr Params(Format F, int L)
+      : format(F), level(L) {}
   Params(DebugCompressionType Type) : Params(formatFor(Type)) {}
 
   Format format;

Copy link

github-actions bot commented Mar 1, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

CompressionFormat = llvm::compression::Format::Zstd;
else if (llvm::compression::zlib::isAvailable())
// Use a high zstd compress level by default for better size reduction.
const int DefaultZstdLevel = 20;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the default compression level for zstd?

It would be great if we could override the compression level. I'm somewhat reluctant to impose max compression level on everyone by default, without any way out, if it turns out to be a problem.

@MaskRay WDYT?

Max compression level may be fine. If we produce enough stuff for compression to take long, compilation time itself will likely dwarf the compression time. For the small TUs, even slow compression may be fine.

@yxsamliu how long the compilation w/o compression takes in your benchmarks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the default compression level for zstd?

default compression level for zstd is 6

It would be great if we could override the compression level. I'm somewhat reluctant to impose max compression level on everyone by default, without any way out, if it turns out to be a problem.

I can add a clang-offload-bundler option -compress-level=n and a clang option --offload-compress-level=n to control the compression level.

@MaskRay WDYT?

Max compression level may be fine. If we produce enough stuff for compression to take long, compilation time itself will likely dwarf the compression time. For the small TUs, even slow compression may be fine.

@yxsamliu how long the compilation w/o compression takes in your benchmarks?

For Blender 4.1, compiling kernels to bitcode for 6 GPU takes 30s. compression with zstd level 20 takes 2s.

Copy link
Member

@MaskRay MaskRay Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this used-once variable which is also confusing const int DefaultZstdLevel = 20;, since 20 is not the default.

High levels decrease compression speed substantially. Have you checked that 20 hits a good sweet spot between 9 to 22 (max)?

Is OffloadBundler used more like a linker than a compile action where parallelism helps? If yes, lld/ELF/OutputSections.cpp has some code, which can be copied here. (Please don't share the code, I think some duplication in lld/ELF makes lld code easier to change)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

level 20 is a sweet spot for both compression rate and compression time, as shown in the following table for Blender 4.1 bundled bitcode for 6 GPU archs:

Zstd Level Size Before (bytes) Size After (bytes) Compression Rate Compression Time (s) Decompression Time (s)
6 68,459,756 32,612,291 2.10 0.8891 0.1809
9 68,459,756 31,445,373 2.18 1.4200 0.1742
15 68,459,756 28,063,493 2.44 9.7994 0.1712
18 68,459,756 24,952,891 2.74 11.4201 0.1796
19 68,459,756 24,690,733 2.77 13.4060 0.1820
20 68,459,756 4,394,993 15.58 2.0946 0.1320
21 68,459,756 4,394,399 15.59 2.1500 0.1318
22 68,459,756 4,394,429 15.59 2.6635 0.1309

Level 20 and level 19 has some compression parameter differences (https://github.com/facebook/zstd/blob/a58b48ef0e543980888a4d9d16c9072ff22135ca/lib/compress/clevels.h#L48 ) the meaning of these parameters are defined at https://github.com/facebook/zstd/blob/a58b48ef0e543980888a4d9d16c9072ff22135ca/lib/zstd.h#L1299. It seems either the largest match distance or fully searched segment makes the difference.

clang-offload-bundler just concatenates the binaries for different GPU arch's together. Parallelization does not help much, unless zstd can be parallelized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiling kernels to bitcode for 6 GPU takes 30s. compression with zstd level 20 takes 2s.

This looks acceptable for me.

unless zstd can be parallelized.

zstd does support multithreaded compression, but enabling it would run into the same issue we had with enabling multi-threaded compilation -- it will interfere with the build system's idea of resource usage.

@@ -1,4 +1,4 @@
// REQUIRES: zlib
// REQUIRES: zlib && !zstd
Copy link
Member

@MaskRay MaskRay Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since zstd configurations are becoming more popular, zlib && !zstd would essentially disable the test for increasingly more bots. But I guess this cannot be improved.

@yxsamliu yxsamliu changed the title [HIP] change compress level [HIP] add --offload-compression-level= option Mar 4, 2024
@@ -906,6 +906,16 @@ CreateFileHandler(MemoryBuffer &FirstInput,
}

OffloadBundlerConfig::OffloadBundlerConfig() {
if (llvm::compression::zstd::isAvailable()) {
CompressionFormat = llvm::compression::Format::Zstd;
// Use a high zstd compress level by default for better size reduction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add more details here. While higher compression levels usually do improve compression ratio, in typical use case it's an incremental improvement. Here, we do it to achieve dramatic increase in compression ratio by exploiting the fact that we carry multiple sets of very similar large bitcode blobs, and that we need compression level high enough to fit one complete blob into compression window. At least that's the theory.

Should we print a warning (or just document it?) when compression level ends up being below of what we'd expect? Considering that good compression starts at zstd-20, I suspect that compression level will go back to ~2.5x if the binary size for one GPU doubles in size and no longer fits. On top of that compression time will also increase, a lot. That will be a rather unpleasant surprise for whoever runs into it.

ZSTD's current compression parameters are set this way:
https://github.com/facebook/zstd/blob/dev/lib/compress/clevels.h#L47

{ 23, 24, 22,  7,  3,256, ZSTD_btultra2},  /* level 19 */
{ 25, 25, 23,  7,  3,256, ZSTD_btultra2,  /* level 20 */

First three numbers are log2 of (largest match distance, fully searched segment, dispatch table).

2^25 = 32MB which happens to be about the size of the single GPU binary in your example. I'm pretty sure this explains why zstd-20 works so well on it, while zstd-19 does not. It will work well for the smaller binaries, but I'm pretty sure it will regress for a slightly larger binary.

I think it may be worth experimenting with fine-tuning compression settings and instead of blindly setting zstd-20, consider the size of the binary we need to deal with, and adjust only windowLog/chainLog appropriately.

Or we could set the default to lower compression level + large windowLog. This should still give us most of the compression benefits for the binaries that would fit into the window, but would avoid the performance cliff if the binary is too large.

I may be overcomplicating it too much, too. If someone does run into the problem, they now have a way to work around it by tweaking the compression level.

Copy link
Member

@Artem-B Artem-B Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I've just discovered that zstd already has --long option:
https://github.com/facebook/zstd/blob/b293d2ebc3a5d29309390a70b3e7861b6f5133ec/lib/zstd.h#L394

    ZSTD_c_enableLongDistanceMatching=160, /* Enable long distance matching.
                                     * This parameter is designed to improve compression ratio
                                     * for large inputs, by finding large matches at long distance.
                                     * It increases memory usage and window size.
                                     * Note: enabling this parameter increases default ZSTD_c_windowLog to 128 MB
                                     * except when expressly set to a different value.
                                     * Note: will be enabled by default if ZSTD_c_windowLog >= 128 MB and
                                     * compression strategy >= ZSTD_btopt (== compression level 16+) */

This sounds like something we could use here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I've just discovered that zstd already has --long option: https://github.com/facebook/zstd/blob/b293d2ebc3a5d29309390a70b3e7861b6f5133ec/lib/zstd.h#L394

    ZSTD_c_enableLongDistanceMatching=160, /* Enable long distance matching.
                                     * This parameter is designed to improve compression ratio
                                     * for large inputs, by finding large matches at long distance.
                                     * It increases memory usage and window size.
                                     * Note: enabling this parameter increases default ZSTD_c_windowLog to 128 MB
                                     * except when expressly set to a different value.
                                     * Note: will be enabled by default if ZSTD_c_windowLog >= 128 MB and
                                     * compression strategy >= ZSTD_btopt (== compression level 16+) */

This sounds like something we could use here.

Thanks this option is promising. Here is some benchmark result of a fat binary containing 13 code objects each of which is about 2.7MB.

The following data is without --long. The numbers are compression level, original size -> compressed size (compression rate), compression speed, decompression speed.

$ zstd -b1 -e22 -f --ultra tmp.o
 1#tmp.o             :  34864866 ->   9169246 (3.802), 657.0 MB/s ,1691.0 MB/s 
 2#tmp.o             :  34864866 ->   7352667 (4.742), 626.3 MB/s ,1903.8 MB/s 
 3#tmp.o             :  34864866 ->   6885718 (5.063), 488.1 MB/s ,1900.2 MB/s 
 4#tmp.o             :  34864866 ->   6700508 (5.203), 416.7 MB/s ,1897.2 MB/s 
 5#tmp.o             :  34864866 ->   6405252 (5.443), 236.4 MB/s ,1918.8 MB/s 
 6#tmp.o             :  34864866 ->   6336706 (5.502), 211.8 MB/s ,1941.4 MB/s 
 7#tmp.o             :  34864866 ->   6170409 (5.650), 153.5 MB/s ,2032.5 MB/s 
 8#tmp.o             :  34864866 ->   6121226 (5.696), 131.1 MB/s ,2071.5 MB/s 
 9#tmp.o             :  34864866 ->   6098948 (5.717), 124.9 MB/s ,2080.4 MB/s 
10#tmp.o             :  34864866 ->   2555599 (13.64), 179.4 MB/s ,3504.2 MB/s 
11#tmp.o             :  34864866 ->   2545375 (13.70), 119.4 MB/s ,3516.8 MB/s 
12#tmp.o             :  34864866 ->   2542711 (13.71), 107.2 MB/s ,3518.4 MB/s 
13#tmp.o             :  34864866 ->   2601619 (13.40),  58.4 MB/s ,3507.6 MB/s 
14#tmp.o             :  34864866 ->   2590656 (13.46),  46.2 MB/s ,3520.4 MB/s 
15#tmp.o             :  34864866 ->   2518599 (13.84),  28.4 MB/s ,3557.4 MB/s 
16#tmp.o             :  34864866 ->   2527122 (13.80),  20.8 MB/s ,3348.5 MB/s 
17#tmp.o             :  34864866 ->   2277125 (15.31),  19.0 MB/s ,3370.6 MB/s 
18#tmp.o             :  34864866 ->   2138918 (16.30),  15.0 MB/s ,3182.2 MB/s 
19#tmp.o             :  34864866 ->   2118238 (16.46),  8.82 MB/s ,3194.5 MB/s 
20#tmp.o             :  34864866 ->   2041007 (17.08),  8.31 MB/s ,3178.4 MB/s 
21#tmp.o             :  34864866 ->   2039075 (17.10),  5.21 MB/s ,3170.6 MB/s 
22#tmp.o             :  34864866 ->   2038568 (17.10),  3.60 MB/s ,3171.5 MB/s 

The following data are with --long:

$ zstd --long -b1 -e22 -f --ultra tmp.o
 1#tmp.o             :  34864866 ->   3281430 (10.62), 375.0 MB/s ,3531.9 MB/s 
 2#tmp.o             :  34864866 ->   2854143 (12.22), 360.6 MB/s ,3536.7 MB/s 
 3#tmp.o             :  34864866 ->   2648807 (13.16), 325.4 MB/s ,3462.7 MB/s 
 4#tmp.o             :  34864866 ->   2548618 (13.68), 309.6 MB/s ,3345.9 MB/s 
 5#tmp.o             :  34864866 ->   2540406 (13.72), 265.8 MB/s ,3297.8 MB/s 
 6#tmp.o             :  34864866 ->   2518788 (13.84), 251.9 MB/s ,3296.0 MB/s 
 7#tmp.o             :  34864866 ->   2451360 (14.22), 206.5 MB/s ,3446.9 MB/s 
 8#tmp.o             :  34864866 ->   2421083 (14.40), 186.5 MB/s ,3522.7 MB/s 
 9#tmp.o             :  34864866 ->   2406717 (14.49), 172.0 MB/s ,3472.2 MB/s 
10#tmp.o             :  34864866 ->   2392819 (14.57), 139.6 MB/s ,3439.4 MB/s 
11#tmp.o             :  34864866 ->   2386599 (14.61), 113.0 MB/s ,3415.2 MB/s 
12#tmp.o             :  34864866 ->   2385088 (14.62), 104.5 MB/s ,3430.0 MB/s 
13#tmp.o             :  34864866 ->   2389264 (14.59),  69.5 MB/s ,3422.9 MB/s 
14#tmp.o             :  34864866 ->   2382705 (14.63),  61.2 MB/s ,3428.6 MB/s 
15#tmp.o             :  34864866 ->   2372640 (14.69),  51.2 MB/s ,3446.7 MB/s 
16#tmp.o             :  34864866 ->   2209022 (15.78),  20.5 MB/s ,3483.3 MB/s 
17#tmp.o             :  34864866 ->   2168474 (16.08),  18.2 MB/s ,3381.5 MB/s 
18#tmp.o             :  34864866 ->   2065724 (16.88),  14.2 MB/s ,3187.8 MB/s 
19#tmp.o             :  34864866 ->   2042810 (17.07),  8.50 MB/s ,3195.7 MB/s 
20#tmp.o             :  34864866 ->   2040443 (17.09),  8.13 MB/s ,3173.0 MB/s 
21#tmp.o             :  34864866 ->   2038794 (17.10),  5.11 MB/s ,3174.3 MB/s 
22#tmp.o             :  34864866 ->   2038375 (17.10),  3.54 MB/s ,3177.8 MB/s 

From the data we can see, with --long, even with compression level 3, the compressed file size is below one code object, whereas the compression speed is very fast. Higher compression level only improves compression rate slightly but with a much slower speed. Therefore compression level 3 seems to be more favorable.

@dwblaikie
Copy link
Collaborator

level 20 is a sweet spot for both compression rate and compression time

I wonder how much this is overfitting for kernels of a particular size, though? (is it making the window just large enough that there's some "memory" from one kernel to the next - but a slightly larger kernel would cause it to fail to see the similarities of two or more kernels - and equally, would a lower level have a smaller window which would be adequate for smaller kernels?)

If level does imply window size and that does imply the size of the kernels that can be efficiently compressed across multiple similar copies - it might be interesting to know what range of kernel sizes fit in the new default, and if that size is representative of the majority of kernels.

@MaskRay
Copy link
Member

MaskRay commented Mar 4, 2024

It may be worth asking on https://github.com/facebook/zstd/ . I am sure zstd maintainers are happy to see more adoption:)

@yxsamliu
Copy link
Collaborator Author

yxsamliu commented Mar 7, 2024

Here is the size distribution of individual code object file for ROCm 6.0 libraries (each code object file is for one GPU arch, and a fat binary contains a bunch of code object files, therefore the optimal compression parameter is mostly related to code object file size ).

Bin Size Count Percentage Cumulative Percentage Example File
0-16K 961 12.31% 12.31% librocrand.so#offset=27172864&size=0
16K-32K 602 7.71% 20.03% librocalution_hip.so#offset=35438592&size=27264
32K-64K 1463 18.75% 38.77% librocalution_hip.so#offset=32391168&size=37808
64K-128K 1134 14.53% 53.31% libMIOpen.so#offset=566800384&size=98984
128K-256K 897 11.49% 64.80% libMIOpen.so#offset=562827264&size=141624
256K-512K 977 12.52% 77.32% libMIOpen.so#offset=659791872&size=504120
512K-1M 482 6.18% 83.50% libMIOpen.so#offset=567713792&size=545032
1M-2M 443 5.68% 89.17% libMIOpen.so#offset=569909248&size=1134632
2M-4M 412 5.28% 94.45% librocrand.so#offset=27172864&size=2650696
4M-8M 251 3.22% 97.67% librocblas.so#offset=1671168&size=5344160
8M-16M 136 1.74% 99.41% librocblas.so#offset=389632000&size=15117200
16M-32M 41 0.53% 99.94% librccl.so#offset=135168&size=20252464
32M-64M 1 0.01% 99.95% TensileLibrary_Type_HH_HPA_Contraction_l_Alik_Bljk_Cijk_Dijk_gfx90a.co
64M-128M 4 0.05% 100.00% TensileLibrary_Type_HH_HPA_ExperimentalGrid_Contraction_l_Ailk_Bjlk_Cijk_Dijk_CU104_gfx90a.co

From the table we can see 99.9% of code object files are below 32MB. Also all code object files are below 128MB.

@yxsamliu
Copy link
Collaborator Author

yxsamliu commented Mar 7, 2024

It may be worth asking on https://github.com/facebook/zstd/ . I am sure zstd maintainers are happy to see more adoption:)

Posted a question to zstd facebook/zstd#3932

@yxsamliu
Copy link
Collaborator Author

yxsamliu commented Mar 8, 2024

zstd developers suggest to enable long distance matching (LDM), i.e. the --long option. I updated the PR with the change, and tested that it works well for bundle entry sizes range from 1KB to 20MB, for both compression rate and compression/decompression speed.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 8, 2024

Should an option like in #84337 be added for the new driver?

@yxsamliu
Copy link
Collaborator Author

yxsamliu commented Mar 8, 2024

Should an option like in #84337 be added for the new driver?

Yes please

@yxsamliu
Copy link
Collaborator Author

yxsamliu commented Mar 8, 2024

Should an option like in #84337 be added for the new driver?

Yes please

Oh. I can add it

@yxsamliu
Copy link
Collaborator Author

yxsamliu commented Mar 8, 2024

Should an option like in #84337 be added for the new driver?

added the option to linker wrapper

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, I'll wait a bit to see if Artem or Fangrui have anything to add.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

TCArgs.getLastArg(options::OPT_offload_compression_level_EQ)) {
std::string CompressionLevelArg =
std::string("-compression-level=") + Arg->getValue();
CmdArgs.push_back(TCArgs.MakeArgString(CompressionLevelArg));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be collapsed to just

CmdArgs.push_back(TCArgs.MakeArgString("-compression-level=" + Arg->getValue()))`. 

Maybe with a Twine or StringRef wrapping the string literal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Added --offload-compression-level= option to clang and -compression-level=
option to clang-offload-bundler and clang-linker-wrapper for
controlling compression level.

Added support of long distance matching (LDM) for llvm::zstd which is off
by default. Enable it for clang-offload-bundler by default since it
improves compression rate in general.

Change default compression level to 3 for zstd for clang-offload-bundler
since it works well for bundle entry size from 1KB to 32MB, which should
cover most of the clang-offload-bundler usage. Users can still specify
compression level by -compression-level= option if necessary.

Change-Id: I5e2d7abcaa11a2a37a5e798476e3f572bba11cab
@yxsamliu yxsamliu merged commit 124d0b7 into llvm:main Mar 9, 2024
GZGavinZhao pushed a commit to GZGavinZhao/rocm-llvm-project that referenced this pull request Jun 21, 2024
Added --offload-compression-level= option to clang and
-compression-level=
option to clang-offload-bundler for controlling compression level.

Added support of long distance matching (LDM) for llvm::zstd which is
off
by default. Enable it for clang-offload-bundler by default since it
improves compression rate in general.

Change default compression level to 3 for zstd for clang-offload-bundler
since it works well for bundle entry size from 1KB to 32MB, which should
cover most of the clang-offload-bundler usage. Users can still specify
compression level by -compression-level= option if necessary.

Signed-off-by: Gavin Zhao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category llvm:support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants