Skip to content

Bitstream Format

flanglet edited this page May 11, 2025 · 10 revisions

Introduction

A Kanzi bitstream usually starts with a header but can be leaderless.

In the latter case, a decompressor must know specific information about the bitstream in order to decompress it:

  • the bitstream version (less than 16)
  • the block size (between MIN_BITSTREAM_BLOCK_SIZE=1024 and MAX_BITSTREAM_BLOCK_SIZE=1024*0124*1024)
  • the name of the entropy codec
  • the list of names of the transforms (up to 8)
  • the checksum type (0, 32 or 64 as the number of bits).

Bitstream header format

The header is present once at the beginning of the stream.

The structure of the header for bitstream version 6 is described below.

bits role value comment name
32 magic number 0x4B414E5A 'KANZ'
4 bit stream version 6 current version bsVersion
2 block checksum 0 to 3 0: no checksum, 1: 32 bit, 2: 64 bits, 3: invalid
5 entropy see table one among 32 _entropyType
48 transforms see table up to 8 among 64 _transformType
28 block size / 16 in bytes block size between 1024 and 1024*1024*1024. _blockSize
2 bits for input size 0 to 3 not provided or >= 2^48: 0, <2^16: 1, <2^32: 2, <2^48: 3 szMask
0,16,32,48 original size(*) 0 to (2^48)-1 hint for the decompressor _outputSize
15 padding 0
24 header checksum 0 to (2^24)-1 see formula below, checked by the decompressor checksum

(*) If the input size is not available to the compressor, this field is skipped.

Formula to compute header checksum:

   // bsVersion, _entropyType, _transformType, _outputSize 
   // and _blockSize are all read from the header (see above)

   uint32 seed = 0x01030507 * uint32(bsVersion);
   const uint32 HASH = 0x1E35A7BD;
   uint32 cksum = HASH * seed;
   cksum ^= (HASH * uint32(~_entropyType));
   cksum ^= (HASH * uint32((~_transformType) >> 32));
   cksum ^= (HASH * uint32(~_transformType));
   cksum ^= (HASH * uint32(~_blockSize));

   if (szMask != 0) {
       cksum ^= (HASH * uint32((~_outputSize) >> 32));
       cksum ^= (HASH * uint32(~_outputSize));
   }

   cksum = (cksum >> 23) ^ (cksum >> 3);
   cksum &= ((1 << 24) - 1);

This checksum must be identical to the 24 bit checksum value at the end of the bitstream, otherwise the header is corrupted.

entropy types:

  • NONE_TYPE = 0; No compression
  • HUFFMAN_TYPE = 1; Huffman
  • FPAQ_TYPE = 2; Fast PAQ (order 0)
  • PAQ_TYPE = 3; PAQ (Obsolete, not supported)
  • RANGE_TYPE = 4; Range
  • ANS0_TYPE = 5; Asymmetric Numerical System order 0
  • CM_TYPE = 6; Context Model
  • TPAQ_TYPE = 7; Tangelo PAQ
  • ANS1_TYPE = 8; Asymmetric Numerical System order 1
  • TPAQX_TYPE = 9; Tangelo PAQ Extra

transform types:

  • NONE_TYPE = 0; Copy
  • BWT_TYPE = 1; Burrows Wheeler
  • BWTS_TYPE = 2; Burrows Wheeler Scott
  • LZ_TYPE = 3; Lempel Ziv
  • SNAPPY_TYPE = 4; Snappy (obsolete, not supported)
  • RLT_TYPE = 5; Run Length
  • ZRLT_TYPE = 6; Zero Run Length
  • MTFT_TYPE = 7; Move To Front
  • RANK_TYPE = 8; Rank
  • EXE_TYPE = 9; Codec for executables
  • DICT_TYPE = 10; Text codec
  • ROLZ_TYPE = 11; ROLZ codec
  • ROLZX_TYPE = 12; ROLZ Extra codec
  • SRT_TYPE = 13; Sorted Rank
  • LZP_TYPE = 14; Lempel Ziv Predict
  • MM_TYPE = 15; Multimedia (FSD) codec
  • LZX_TYPE = 16; Lempel Ziv Extra
  • UTF_TYPE = 17; UTF codec
  • PACK_TYPE = 18; Pack codec
  • DNA_TYPE = 19; DNA codec

block header format

After the header, one or many blocks must be present.

The bitstream must end with an empty block (which signals a valid end of stream).

Each block starts with a header with the following format:

bits optional? role comment
5 no log(cbs) - 3 block size <= 1GB
log(cbs) no cbs compressed block size (cbs), if 0 last block

The empty block is therefore just a 0 byte (five 0 bits for log(code block size)-3 and three 0 bits for the block size).

Each block header (except for the last empty block) is followed by block data.

bits optional? role comment
8 no block mode see format below
8 yes block transform skip flags present if more than 4 transforms only
8*ds no data ds = 1+((mode>>5)&0x03)
32 yes block checksum XXHash32 checksum
// mode | 0b10000000 => copy block
//      | 0b0yy00000 => size(size(block))-1
//      | 0b000y0000 => 1 if more than 4 transforms
//  case 4 transforms or less
//      | 0b0000yyyy => transform sequence skip flags (1 means skip)
//  case more than 4 transforms, read next byte
//      | 0byyyyyyyy => transform sequence skip flags (1 means skip)

To decode a block

  • Decode the block header
  • Read the block data
  • Entropy decode the data
  • Run sequentially every inverse transform on the decoded data (except those with a skip flag set to 1)
  • If a checksum was present, compute checksum of the decompressed data and compare with the checksum provided in the bitstream
Clone this wiki locally