Skip to content

Commit 3075392

Browse files
committed
[Compression] Accelerate the code that decodes chars from var-length streams.
This change accelerates the code that decodes characters from bitstreams (represented using APInts) that are encoded using variable length encoding. The idea is simple: Extract the lowest 64 bit and decode as many characters as possible before changing the bignum. Decoding words from a local variable is much faster than changing big nums. This commit makes decompression twice as fast as compression.
1 parent 2d7f6e7 commit 3075392

File tree

3 files changed

+108
-135
lines changed

3 files changed

+108
-135
lines changed

lib/ABI/Compression.cpp

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -330,12 +330,49 @@ std::string swift::Compress::DecodeStringFromNumber(const APInt &In,
330330
EncodingKind Kind) {
331331
APInt num = In;
332332
std::string sb;
333+
// This is the max number of bits that we can hold in a 64bit number without
334+
// overflowing in the next round of character decoding.
335+
unsigned MaxBitsPerWord = 64 - Huffman::LongestEncodingLength;
333336

334337
if (Kind == EncodingKind::Variable) {
335338
// Keep decoding until we reach our sentinel value.
336339
// See the encoder implementation for more details.
337340
while (num.ugt(1)) {
338-
sb += Huffman::variable_decode(num);
341+
// Try to decode a bunch of characters together without modifying the
342+
// big number.
343+
if (num.getActiveBits() > 64) {
344+
// Collect the bottom 64-bit.
345+
uint64_t tailbits = *num.getRawData();
346+
// This variable is used to record the number of bits that were
347+
// extracted from the lowest 64 bit of the big number.
348+
unsigned bits = 0;
349+
350+
// Keep extracting bits from the tail of the APInt until you reach
351+
// the end of the word (64 bits minus the size of the largest
352+
// character possible).
353+
while (bits < MaxBitsPerWord) {
354+
char ch;
355+
unsigned local_bits;
356+
std::tie(ch, local_bits) = Huffman::variable_decode(tailbits);
357+
sb += ch;
358+
tailbits >>= local_bits;
359+
bits += local_bits;
360+
}
361+
// Now that we've extracted a few characters from the tail of the APInt
362+
// we need to shift the APInt and prepare for the next round. We shift
363+
// the APInt by the number of bits that we extracted in the loop above.
364+
num = num.lshr(bits);
365+
bits = 0;
366+
} else {
367+
// We don't have enough bits in the big num in order to extract a few
368+
// numbers at once, so just extract a single character.
369+
uint64_t tailbits = *num.getRawData();
370+
char ch;
371+
unsigned bits;
372+
std::tie(ch, bits) = Huffman::variable_decode(tailbits);
373+
sb += ch;
374+
num = num.lshr(bits);
375+
}
339376
}
340377
} else {
341378
// Decode this number as a regular fixed-width sequence of characters.

0 commit comments

Comments
 (0)