[HIP] add --offload-compression-level= option #83605

yxsamliu · 2024-03-01T18:30:23Z

Added --offload-compression-level= option to clang and -compression-level=
option to clang-offload-bundler for controlling compression level.

Added support of long distance matching (LDM) for llvm::zstd which is off
by default. Enable it for clang-offload-bundler by default since it
improves compression rate in general.

Change default compression level to 3 for zstd for clang-offload-bundler
since it works well for bundle entry size from 1KB to 32MB, which should
cover most of the clang-offload-bundler usage. Users can still specify
compression level by -compression-level= option if necessary.

llvmbot · 2024-03-01T18:30:52Z

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

Change compression level to 20 for zstd better
compression rate.

Full diff: https://github.com/llvm/llvm-project/pull/83605.diff

4 Files Affected:

(modified) clang/lib/Driver/OffloadBundler.cpp (+12-4)
(modified) clang/test/Driver/clang-offload-bundler-zlib.c (+3-2)
(modified) clang/test/Driver/clang-offload-bundler-zstd.c (+2-1)
(modified) llvm/include/llvm/Support/Compression.h (+2)

diff --git a/clang/lib/Driver/OffloadBundler.cpp b/clang/lib/Driver/OffloadBundler.cpp
index 99a34d25cfcd56..ef48b2f33d3783 100644
--- a/clang/lib/Driver/OffloadBundler.cpp
+++ b/clang/lib/Driver/OffloadBundler.cpp
@@ -942,12 +942,19 @@ CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
       Input.getBuffer().size());
 
   llvm::compression::Format CompressionFormat;
+  int Level;
 
-  if (llvm::compression::zstd::isAvailable())
+  if (llvm::compression::zstd::isAvailable()) {
     CompressionFormat = llvm::compression::Format::Zstd;
-  else if (llvm::compression::zlib::isAvailable())
+    // Use a high zstd compress level by default for better size reduction.
+    const int DefaultZstdLevel = 20;
+    Level = DefaultZstdLevel;
+  } else if (llvm::compression::zlib::isAvailable()) {
     CompressionFormat = llvm::compression::Format::Zlib;
-  else
+    // Use default level for zlib since higher level does not have significant
+    // improvement.
+    Level = llvm::compression::zlib::DefaultCompression;
+  } else
     return createStringError(llvm::inconvertibleErrorCode(),
                              "Compression not supported");
 
@@ -955,7 +962,7 @@ CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
                             ClangOffloadBundlerTimerGroup);
   if (Verbose)
     CompressTimer.startTimer();
-  llvm::compression::compress(CompressionFormat, BufferUint8, CompressedBuffer);
+  llvm::compression::compress({CompressionFormat, Level}, BufferUint8, CompressedBuffer);
   if (Verbose)
     CompressTimer.stopTimer();
 
@@ -980,6 +987,7 @@ CompressedOffloadBundle::compress(const llvm::MemoryBuffer &Input,
         CompressionFormat == llvm::compression::Format::Zstd ? "zstd" : "zlib";
     llvm::errs() << "Compressed bundle format version: " << Version << "\n"
                  << "Compression method used: " << MethodUsed << "\n"
+                 << "Compression level: " << Level << "\n"
                  << "Binary size before compression: " << UncompressedSize
                  << " bytes\n"
                  << "Binary size after compression: " << CompressedBuffer.size()
diff --git a/clang/test/Driver/clang-offload-bundler-zlib.c b/clang/test/Driver/clang-offload-bundler-zlib.c
index a57ee6da9a86a6..aed3a03d5341b5 100644
--- a/clang/test/Driver/clang-offload-bundler-zlib.c
+++ b/clang/test/Driver/clang-offload-bundler-zlib.c
@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib && !zstd
 // REQUIRES: x86-registered-target
 // UNSUPPORTED: target={{.*}}-darwin{{.*}}, target={{.*}}-aix{{.*}}
 
@@ -34,7 +34,8 @@
 // RUN: diff %t.tgt2 %t.res.tgt2
 
 //
-// COMPRESS: Compression method used:
+// COMPRESS: Compression method used: zlib
+// COMPRESS: Compression level: 9
 // DECOMPRESS: Decompression method:
 // NOHOST-NOT: host-
 // NOHOST-DAG: hip-amdgcn-amd-amdhsa--gfx900
diff --git a/clang/test/Driver/clang-offload-bundler-zstd.c b/clang/test/Driver/clang-offload-bundler-zstd.c
index 3b577d4d166a3f..200634e05fa145 100644
--- a/clang/test/Driver/clang-offload-bundler-zstd.c
+++ b/clang/test/Driver/clang-offload-bundler-zstd.c
@@ -31,7 +31,8 @@
 // RUN: diff %t.tgt1 %t.res.tgt1
 // RUN: diff %t.tgt2 %t.res.tgt2
 //
-// COMPRESS: Compression method used
+// COMPRESS: Compression method used: zstd
+// COMPRESS: Compression level: 20
 // DECOMPRESS: Decompression method
 // NOHOST-NOT: host-
 // NOHOST-DAG: hip-amdgcn-amd-amdhsa--gfx900
diff --git a/llvm/include/llvm/Support/Compression.h b/llvm/include/llvm/Support/Compression.h
index c3ba3274d6ed87..9c9c3e2cc4a6c0 100644
--- a/llvm/include/llvm/Support/Compression.h
+++ b/llvm/include/llvm/Support/Compression.h
@@ -94,6 +94,8 @@ struct Params {
   constexpr Params(Format F)
       : format(F), level(F == Format::Zlib ? zlib::DefaultCompression
                                            : zstd::DefaultCompression) {}
+  constexpr Params(Format F, int L)
+      : format(F), level(L) {}
   Params(DebugCompressionType Type) : Params(formatFor(Type)) {}
 
   Format format;

github-actions · 2024-03-01T18:33:05Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Artem-B · 2024-03-01T21:01:00Z

clang/lib/Driver/OffloadBundler.cpp

    CompressionFormat = llvm::compression::Format::Zstd;
-  else if (llvm::compression::zlib::isAvailable())
+    // Use a high zstd compress level by default for better size reduction.
+    const int DefaultZstdLevel = 20;


What's the default compression level for zstd?

It would be great if we could override the compression level. I'm somewhat reluctant to impose max compression level on everyone by default, without any way out, if it turns out to be a problem.

@MaskRay WDYT?

Max compression level may be fine. If we produce enough stuff for compression to take long, compilation time itself will likely dwarf the compression time. For the small TUs, even slow compression may be fine.

@yxsamliu how long the compilation w/o compression takes in your benchmarks?

What's the default compression level for zstd?

default compression level for zstd is 6

It would be great if we could override the compression level. I'm somewhat reluctant to impose max compression level on everyone by default, without any way out, if it turns out to be a problem.

I can add a clang-offload-bundler option -compress-level=n and a clang option --offload-compress-level=n to control the compression level.

@MaskRay WDYT?

Max compression level may be fine. If we produce enough stuff for compression to take long, compilation time itself will likely dwarf the compression time. For the small TUs, even slow compression may be fine.

@yxsamliu how long the compilation w/o compression takes in your benchmarks?

For Blender 4.1, compiling kernels to bitcode for 6 GPU takes 30s. compression with zstd level 20 takes 2s.

Delete this used-once variable which is also confusing const int DefaultZstdLevel = 20;, since 20 is not the default.

High levels decrease compression speed substantially. Have you checked that 20 hits a good sweet spot between 9 to 22 (max)?

Is OffloadBundler used more like a linker than a compile action where parallelism helps? If yes, lld/ELF/OutputSections.cpp has some code, which can be copied here. (Please don't share the code, I think some duplication in lld/ELF makes lld code easier to change)

level 20 is a sweet spot for both compression rate and compression time, as shown in the following table for Blender 4.1 bundled bitcode for 6 GPU archs:

Zstd Level Size Before (bytes) Size After (bytes) Compression Rate Compression Time (s) Decompression Time (s)

6 68,459,756 32,612,291 2.10 0.8891 0.1809

9 68,459,756 31,445,373 2.18 1.4200 0.1742

15 68,459,756 28,063,493 2.44 9.7994 0.1712

18 68,459,756 24,952,891 2.74 11.4201 0.1796

19 68,459,756 24,690,733 2.77 13.4060 0.1820

20 68,459,756 4,394,993 15.58 2.0946 0.1320

21 68,459,756 4,394,399 15.59 2.1500 0.1318

22 68,459,756 4,394,429 15.59 2.6635 0.1309

Level 20 and level 19 has some compression parameter differences (https://github.com/facebook/zstd/blob/a58b48ef0e543980888a4d9d16c9072ff22135ca/lib/compress/clevels.h#L48 ) the meaning of these parameters are defined at https://github.com/facebook/zstd/blob/a58b48ef0e543980888a4d9d16c9072ff22135ca/lib/zstd.h#L1299. It seems either the largest match distance or fully searched segment makes the difference.

clang-offload-bundler just concatenates the binaries for different GPU arch's together. Parallelization does not help much, unless zstd can be parallelized.

compiling kernels to bitcode for 6 GPU takes 30s. compression with zstd level 20 takes 2s.

This looks acceptable for me.

unless zstd can be parallelized.

zstd does support multithreaded compression, but enabling it would run into the same issue we had with enabling multi-threaded compilation -- it will interfere with the build system's idea of resource usage.

MaskRay · 2024-03-03T01:31:49Z

clang/test/Driver/clang-offload-bundler-zlib.c

@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib && !zstd


Since zstd configurations are becoming more popular, zlib && !zstd would essentially disable the test for increasingly more bots. But I guess this cannot be improved.

Artem-B · 2024-03-04T19:28:49Z

clang/lib/Driver/OffloadBundler.cpp

@@ -906,6 +906,16 @@ CreateFileHandler(MemoryBuffer &FirstInput,
 }

 OffloadBundlerConfig::OffloadBundlerConfig() {
+  if (llvm::compression::zstd::isAvailable()) {
+    CompressionFormat = llvm::compression::Format::Zstd;
+    // Use a high zstd compress level by default for better size reduction.


I'd add more details here. While higher compression levels usually do improve compression ratio, in typical use case it's an incremental improvement. Here, we do it to achieve dramatic increase in compression ratio by exploiting the fact that we carry multiple sets of very similar large bitcode blobs, and that we need compression level high enough to fit one complete blob into compression window. At least that's the theory.

Should we print a warning (or just document it?) when compression level ends up being below of what we'd expect? Considering that good compression starts at zstd-20, I suspect that compression level will go back to ~2.5x if the binary size for one GPU doubles in size and no longer fits. On top of that compression time will also increase, a lot. That will be a rather unpleasant surprise for whoever runs into it.

ZSTD's current compression parameters are set this way:
https://github.com/facebook/zstd/blob/dev/lib/compress/clevels.h#L47

{ 23, 24, 22, 7, 3,256, ZSTD_btultra2}, /* level 19 */ { 25, 25, 23, 7, 3,256, ZSTD_btultra2, /* level 20 */

First three numbers are log2 of (largest match distance, fully searched segment, dispatch table).

2^25 = 32MB which happens to be about the size of the single GPU binary in your example. I'm pretty sure this explains why zstd-20 works so well on it, while zstd-19 does not. It will work well for the smaller binaries, but I'm pretty sure it will regress for a slightly larger binary.

I think it may be worth experimenting with fine-tuning compression settings and instead of blindly setting zstd-20, consider the size of the binary we need to deal with, and adjust only windowLog/chainLog appropriately.

Or we could set the default to lower compression level + large windowLog. This should still give us most of the compression benefits for the binaries that would fit into the window, but would avoid the performance cliff if the binary is too large.

I may be overcomplicating it too much, too. If someone does run into the problem, they now have a way to work around it by tweaking the compression level.

Also, I've just discovered that zstd already has --long option:
https://github.com/facebook/zstd/blob/b293d2ebc3a5d29309390a70b3e7861b6f5133ec/lib/zstd.h#L394

ZSTD_c_enableLongDistanceMatching=160, /* Enable long distance matching. * This parameter is designed to improve compression ratio * for large inputs, by finding large matches at long distance. * It increases memory usage and window size. * Note: enabling this parameter increases default ZSTD_c_windowLog to 128 MB * except when expressly set to a different value. * Note: will be enabled by default if ZSTD_c_windowLog >= 128 MB and * compression strategy >= ZSTD_btopt (== compression level 16+) */

This sounds like something we could use here.

Also, I've just discovered that zstd already has --long option: https://github.com/facebook/zstd/blob/b293d2ebc3a5d29309390a70b3e7861b6f5133ec/lib/zstd.h#L394

ZSTD_c_enableLongDistanceMatching=160, /* Enable long distance matching. * This parameter is designed to improve compression ratio * for large inputs, by finding large matches at long distance. * It increases memory usage and window size. * Note: enabling this parameter increases default ZSTD_c_windowLog to 128 MB * except when expressly set to a different value. * Note: will be enabled by default if ZSTD_c_windowLog >= 128 MB and * compression strategy >= ZSTD_btopt (== compression level 16+) */

This sounds like something we could use here.

Thanks this option is promising. Here is some benchmark result of a fat binary containing 13 code objects each of which is about 2.7MB.

The following data is without --long. The numbers are compression level, original size -> compressed size (compression rate), compression speed, decompression speed.

$ zstd -b1 -e22 -f --ultra tmp.o 1#tmp.o : 34864866 -> 9169246 (3.802), 657.0 MB/s ,1691.0 MB/s 2#tmp.o : 34864866 -> 7352667 (4.742), 626.3 MB/s ,1903.8 MB/s 3#tmp.o : 34864866 -> 6885718 (5.063), 488.1 MB/s ,1900.2 MB/s 4#tmp.o : 34864866 -> 6700508 (5.203), 416.7 MB/s ,1897.2 MB/s 5#tmp.o : 34864866 -> 6405252 (5.443), 236.4 MB/s ,1918.8 MB/s 6#tmp.o : 34864866 -> 6336706 (5.502), 211.8 MB/s ,1941.4 MB/s 7#tmp.o : 34864866 -> 6170409 (5.650), 153.5 MB/s ,2032.5 MB/s 8#tmp.o : 34864866 -> 6121226 (5.696), 131.1 MB/s ,2071.5 MB/s 9#tmp.o : 34864866 -> 6098948 (5.717), 124.9 MB/s ,2080.4 MB/s 10#tmp.o : 34864866 -> 2555599 (13.64), 179.4 MB/s ,3504.2 MB/s 11#tmp.o : 34864866 -> 2545375 (13.70), 119.4 MB/s ,3516.8 MB/s 12#tmp.o : 34864866 -> 2542711 (13.71), 107.2 MB/s ,3518.4 MB/s 13#tmp.o : 34864866 -> 2601619 (13.40), 58.4 MB/s ,3507.6 MB/s 14#tmp.o : 34864866 -> 2590656 (13.46), 46.2 MB/s ,3520.4 MB/s 15#tmp.o : 34864866 -> 2518599 (13.84), 28.4 MB/s ,3557.4 MB/s 16#tmp.o : 34864866 -> 2527122 (13.80), 20.8 MB/s ,3348.5 MB/s 17#tmp.o : 34864866 -> 2277125 (15.31), 19.0 MB/s ,3370.6 MB/s 18#tmp.o : 34864866 -> 2138918 (16.30), 15.0 MB/s ,3182.2 MB/s 19#tmp.o : 34864866 -> 2118238 (16.46), 8.82 MB/s ,3194.5 MB/s 20#tmp.o : 34864866 -> 2041007 (17.08), 8.31 MB/s ,3178.4 MB/s 21#tmp.o : 34864866 -> 2039075 (17.10), 5.21 MB/s ,3170.6 MB/s 22#tmp.o : 34864866 -> 2038568 (17.10), 3.60 MB/s ,3171.5 MB/s

The following data are with --long:

$ zstd --long -b1 -e22 -f --ultra tmp.o 1#tmp.o : 34864866 -> 3281430 (10.62), 375.0 MB/s ,3531.9 MB/s 2#tmp.o : 34864866 -> 2854143 (12.22), 360.6 MB/s ,3536.7 MB/s 3#tmp.o : 34864866 -> 2648807 (13.16), 325.4 MB/s ,3462.7 MB/s 4#tmp.o : 34864866 -> 2548618 (13.68), 309.6 MB/s ,3345.9 MB/s 5#tmp.o : 34864866 -> 2540406 (13.72), 265.8 MB/s ,3297.8 MB/s 6#tmp.o : 34864866 -> 2518788 (13.84), 251.9 MB/s ,3296.0 MB/s 7#tmp.o : 34864866 -> 2451360 (14.22), 206.5 MB/s ,3446.9 MB/s 8#tmp.o : 34864866 -> 2421083 (14.40), 186.5 MB/s ,3522.7 MB/s 9#tmp.o : 34864866 -> 2406717 (14.49), 172.0 MB/s ,3472.2 MB/s 10#tmp.o : 34864866 -> 2392819 (14.57), 139.6 MB/s ,3439.4 MB/s 11#tmp.o : 34864866 -> 2386599 (14.61), 113.0 MB/s ,3415.2 MB/s 12#tmp.o : 34864866 -> 2385088 (14.62), 104.5 MB/s ,3430.0 MB/s 13#tmp.o : 34864866 -> 2389264 (14.59), 69.5 MB/s ,3422.9 MB/s 14#tmp.o : 34864866 -> 2382705 (14.63), 61.2 MB/s ,3428.6 MB/s 15#tmp.o : 34864866 -> 2372640 (14.69), 51.2 MB/s ,3446.7 MB/s 16#tmp.o : 34864866 -> 2209022 (15.78), 20.5 MB/s ,3483.3 MB/s 17#tmp.o : 34864866 -> 2168474 (16.08), 18.2 MB/s ,3381.5 MB/s 18#tmp.o : 34864866 -> 2065724 (16.88), 14.2 MB/s ,3187.8 MB/s 19#tmp.o : 34864866 -> 2042810 (17.07), 8.50 MB/s ,3195.7 MB/s 20#tmp.o : 34864866 -> 2040443 (17.09), 8.13 MB/s ,3173.0 MB/s 21#tmp.o : 34864866 -> 2038794 (17.10), 5.11 MB/s ,3174.3 MB/s 22#tmp.o : 34864866 -> 2038375 (17.10), 3.54 MB/s ,3177.8 MB/s

From the data we can see, with --long, even with compression level 3, the compressed file size is below one code object, whereas the compression speed is very fast. Higher compression level only improves compression rate slightly but with a much slower speed. Therefore compression level 3 seems to be more favorable.

dwblaikie · 2024-03-04T21:20:16Z

level 20 is a sweet spot for both compression rate and compression time

I wonder how much this is overfitting for kernels of a particular size, though? (is it making the window just large enough that there's some "memory" from one kernel to the next - but a slightly larger kernel would cause it to fail to see the similarities of two or more kernels - and equally, would a lower level have a smaller window which would be adequate for smaller kernels?)

If level does imply window size and that does imply the size of the kernels that can be efficiently compressed across multiple similar copies - it might be interesting to know what range of kernel sizes fit in the new default, and if that size is representative of the majority of kernels.

MaskRay · 2024-03-04T21:54:19Z

It may be worth asking on https://github.com/facebook/zstd/ . I am sure zstd maintainers are happy to see more adoption:)

yxsamliu · 2024-03-07T15:29:27Z

Here is the size distribution of individual code object file for ROCm 6.0 libraries (each code object file is for one GPU arch, and a fat binary contains a bunch of code object files, therefore the optimal compression parameter is mostly related to code object file size ).

Bin Size	Count	Percentage	Cumulative Percentage	Example File
0-16K	961	12.31%	12.31%	`librocrand.so#offset=27172864&size=0`
16K-32K	602	7.71%	20.03%	`librocalution_hip.so#offset=35438592&size=27264`
32K-64K	1463	18.75%	38.77%	`librocalution_hip.so#offset=32391168&size=37808`
64K-128K	1134	14.53%	53.31%	`libMIOpen.so#offset=566800384&size=98984`
128K-256K	897	11.49%	64.80%	`libMIOpen.so#offset=562827264&size=141624`
256K-512K	977	12.52%	77.32%	`libMIOpen.so#offset=659791872&size=504120`
512K-1M	482	6.18%	83.50%	`libMIOpen.so#offset=567713792&size=545032`
1M-2M	443	5.68%	89.17%	`libMIOpen.so#offset=569909248&size=1134632`
2M-4M	412	5.28%	94.45%	`librocrand.so#offset=27172864&size=2650696`
4M-8M	251	3.22%	97.67%	`librocblas.so#offset=1671168&size=5344160`
8M-16M	136	1.74%	99.41%	`librocblas.so#offset=389632000&size=15117200`
16M-32M	41	0.53%	99.94%	`librccl.so#offset=135168&size=20252464`
32M-64M	1	0.01%	99.95%	`TensileLibrary_Type_HH_HPA_Contraction_l_Alik_Bljk_Cijk_Dijk_gfx90a.co`
64M-128M	4	0.05%	100.00%	`TensileLibrary_Type_HH_HPA_ExperimentalGrid_Contraction_l_Ailk_Bjlk_Cijk_Dijk_CU104_gfx90a.co`

From the table we can see 99.9% of code object files are below 32MB. Also all code object files are below 128MB.

yxsamliu · 2024-03-07T16:35:12Z

It may be worth asking on https://github.com/facebook/zstd/ . I am sure zstd maintainers are happy to see more adoption:)

Posted a question to zstd facebook/zstd#3932

yxsamliu · 2024-03-08T15:57:45Z

zstd developers suggest to enable long distance matching (LDM), i.e. the --long option. I updated the PR with the change, and tested that it works well for bundle entry sizes range from 1KB to 20MB, for both compression rate and compression/decompression speed.

jhuber6 · 2024-03-08T16:00:02Z

Should an option like in #84337 be added for the new driver?

yxsamliu · 2024-03-08T16:03:56Z

Should an option like in #84337 be added for the new driver?

Yes please

yxsamliu · 2024-03-08T16:04:22Z

Should an option like in #84337 be added for the new driver?

Yes please

Oh. I can add it

yxsamliu · 2024-03-08T16:42:03Z

Should an option like in #84337 be added for the new driver?

added the option to linker wrapper

jhuber6

Looks fine to me, I'll wait a bit to see if Artem or Fangrui have anything to add.

Artem-B

LGTM.

Artem-B · 2024-03-08T18:20:27Z

clang/lib/Driver/ToolChains/CommonArgs.cpp

+          TCArgs.getLastArg(options::OPT_offload_compression_level_EQ)) {
+    std::string CompressionLevelArg =
+        std::string("-compression-level=") + Arg->getValue();
+    CmdArgs.push_back(TCArgs.MakeArgString(CompressionLevelArg));


This may be collapsed to just

CmdArgs.push_back(TCArgs.MakeArgString("-compression-level=" + Arg->getValue()))`.

Maybe with a Twine or StringRef wrapping the string literal.

Added --offload-compression-level= option to clang and -compression-level= option to clang-offload-bundler and clang-linker-wrapper for controlling compression level. Added support of long distance matching (LDM) for llvm::zstd which is off by default. Enable it for clang-offload-bundler by default since it improves compression rate in general. Change default compression level to 3 for zstd for clang-offload-bundler since it works well for bundle entry size from 1KB to 32MB, which should cover most of the clang-offload-bundler usage. Users can still specify compression level by -compression-level= option if necessary. Change-Id: I5e2d7abcaa11a2a37a5e798476e3f572bba11cab

Added --offload-compression-level= option to clang and -compression-level= option to clang-offload-bundler for controlling compression level. Added support of long distance matching (LDM) for llvm::zstd which is off by default. Enable it for clang-offload-bundler by default since it improves compression rate in general. Change default compression level to 3 for zstd for clang-offload-bundler since it works well for bundle entry size from 1KB to 32MB, which should cover most of the clang-offload-bundler usage. Users can still specify compression level by -compression-level= option if necessary. Signed-off-by: Gavin Zhao <[email protected]>

yxsamliu requested review from MaskRay, Artem-B and jhuber6 March 1, 2024 18:30

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' llvm:support labels Mar 1, 2024

yxsamliu force-pushed the compress-level branch from 4fac5b1 to 6b5687e Compare March 1, 2024 18:32

Artem-B reviewed Mar 1, 2024

View reviewed changes

MaskRay reviewed Mar 3, 2024

View reviewed changes

yxsamliu force-pushed the compress-level branch from 6b5687e to f846e24 Compare March 4, 2024 04:55

yxsamliu changed the title ~~[HIP] change compress level~~ [HIP] add --offload-compression-level= option Mar 4, 2024

Artem-B reviewed Mar 4, 2024

View reviewed changes

yxsamliu mentioned this pull request Mar 7, 2024

compressing files containing multiple similar portions facebook/zstd#3932

Closed

yxsamliu force-pushed the compress-level branch from f846e24 to 16796bc Compare March 8, 2024 15:55

yxsamliu force-pushed the compress-level branch from 16796bc to 60faf7f Compare March 8, 2024 16:40

jhuber6 reviewed Mar 8, 2024

View reviewed changes

yxsamliu force-pushed the compress-level branch from 60faf7f to 78ad578 Compare March 8, 2024 17:22

Artem-B approved these changes Mar 8, 2024

View reviewed changes

yxsamliu force-pushed the compress-level branch from 78ad578 to 906b23c Compare March 8, 2024 21:26

yxsamliu force-pushed the compress-level branch from 906b23c to c46a3ce Compare March 8, 2024 21:44

yxsamliu merged commit 124d0b7 into llvm:main Mar 9, 2024

Zstd Level	Size Before (bytes)	Size After (bytes)	Compression Rate	Compression Time (s)	Decompression Time (s)
6	68,459,756	32,612,291	2.10	0.8891	0.1809
9	68,459,756	31,445,373	2.18	1.4200	0.1742
15	68,459,756	28,063,493	2.44	9.7994	0.1712
18	68,459,756	24,952,891	2.74	11.4201	0.1796
19	68,459,756	24,690,733	2.77	13.4060	0.1820
20	68,459,756	4,394,993	15.58	2.0946	0.1320
21	68,459,756	4,394,399	15.59	2.1500	0.1318
22	68,459,756	4,394,429	15.59	2.6635	0.1309

[HIP] add --offload-compression-level= option #83605

[HIP] add --offload-compression-level= option #83605

Uh oh!

Conversation

yxsamliu commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaskRay Mar 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaskRay Mar 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Artem-B Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dwblaikie commented Mar 4, 2024

Uh oh!

MaskRay commented Mar 4, 2024

Uh oh!

yxsamliu commented Mar 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yxsamliu commented Mar 7, 2024

Uh oh!

yxsamliu commented Mar 8, 2024

Uh oh!

jhuber6 commented Mar 8, 2024

Uh oh!

yxsamliu commented Mar 8, 2024

Uh oh!

yxsamliu commented Mar 8, 2024

Uh oh!

yxsamliu commented Mar 8, 2024

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yxsamliu commented Mar 1, 2024 •

edited

Loading

llvmbot commented Mar 1, 2024 •

edited

Loading

github-actions bot commented Mar 1, 2024 •

edited

Loading

MaskRay Mar 3, 2024 •

edited

Loading

MaskRay Mar 3, 2024 •

edited

Loading

Artem-B Mar 4, 2024 •

edited

Loading

yxsamliu commented Mar 7, 2024 •

edited

Loading