aboutsummaryrefslogtreecommitdiff
path: root/sys/contrib/zstd/doc/zstd_compression_format.md
diff options
context:
space:
mode:
Diffstat (limited to 'sys/contrib/zstd/doc/zstd_compression_format.md')
-rw-r--r--sys/contrib/zstd/doc/zstd_compression_format.md1692
1 files changed, 0 insertions, 1692 deletions
diff --git a/sys/contrib/zstd/doc/zstd_compression_format.md b/sys/contrib/zstd/doc/zstd_compression_format.md
deleted file mode 100644
index fc09bd5538c5..000000000000
--- a/sys/contrib/zstd/doc/zstd_compression_format.md
+++ /dev/null
@@ -1,1692 +0,0 @@
-Zstandard Compression Format
-============================
-
-### Notices
-
-Copyright (c) 2016-2021 Yann Collet, Facebook, Inc.
-
-Permission is granted to copy and distribute this document
-for any purpose and without charge,
-including translations into other languages
-and incorporation into compilations,
-provided that the copyright notice and this notice are preserved,
-and that any substantive changes or deletions from the original
-are clearly marked.
-Distribution of this document is unlimited.
-
-### Version
-
-0.3.7 (2020-12-09)
-
-
-Introduction
-------------
-
-The purpose of this document is to define a lossless compressed data format,
-that is independent of CPU type, operating system,
-file system and character set, suitable for
-file compression, pipe and streaming compression,
-using the [Zstandard algorithm](http://www.zstandard.org).
-The text of the specification assumes a basic background in programming
-at the level of bits and other primitive data representations.
-
-The data can be produced or consumed,
-even for an arbitrarily long sequentially presented input data stream,
-using only an a priori bounded amount of intermediate storage,
-and hence can be used in data communications.
-The format uses the Zstandard compression method,
-and optional [xxHash-64 checksum method](http://www.xxhash.org),
-for detection of data corruption.
-
-The data format defined by this specification
-does not attempt to allow random access to compressed data.
-
-Unless otherwise indicated below,
-a compliant compressor must produce data sets
-that conform to the specifications presented here.
-It doesn’t need to support all options though.
-
-A compliant decompressor must be able to decompress
-at least one working set of parameters
-that conforms to the specifications presented here.
-It may also ignore informative fields, such as checksum.
-Whenever it does not support a parameter defined in the compressed stream,
-it must produce a non-ambiguous error code and associated error message
-explaining which parameter is unsupported.
-
-This specification is intended for use by implementers of software
-to compress data into Zstandard format and/or decompress data from Zstandard format.
-The Zstandard format is supported by an open source reference implementation,
-written in portable C, and available at : https://github.com/facebook/zstd .
-
-
-### Overall conventions
-In this document:
-- square brackets i.e. `[` and `]` are used to indicate optional fields or parameters.
-- the naming convention for identifiers is `Mixed_Case_With_Underscores`
-
-### Definitions
-Content compressed by Zstandard is transformed into a Zstandard __frame__.
-Multiple frames can be appended into a single file or stream.
-A frame is completely independent, has a defined beginning and end,
-and a set of parameters which tells the decoder how to decompress it.
-
-A frame encapsulates one or multiple __blocks__.
-Each block contains arbitrary content, which is described by its header,
-and has a guaranteed maximum content size, which depends on frame parameters.
-Unlike frames, each block depends on previous blocks for proper decoding.
-However, each block can be decompressed without waiting for its successor,
-allowing streaming operations.
-
-Overview
----------
-- [Frames](#frames)
- - [Zstandard frames](#zstandard-frames)
- - [Blocks](#blocks)
- - [Literals Section](#literals-section)
- - [Sequences Section](#sequences-section)
- - [Sequence Execution](#sequence-execution)
- - [Skippable frames](#skippable-frames)
-- [Entropy Encoding](#entropy-encoding)
- - [FSE](#fse)
- - [Huffman Coding](#huffman-coding)
-- [Dictionary Format](#dictionary-format)
-
-Frames
-------
-Zstandard compressed data is made of one or more __frames__.
-Each frame is independent and can be decompressed independently of other frames.
-The decompressed content of multiple concatenated frames is the concatenation of
-each frame decompressed content.
-
-There are two frame formats defined by Zstandard:
- Zstandard frames and Skippable frames.
-Zstandard frames contain compressed data, while
-skippable frames contain custom user metadata.
-
-## Zstandard frames
-The structure of a single Zstandard frame is following:
-
-| `Magic_Number` | `Frame_Header` |`Data_Block`| [More data blocks] | [`Content_Checksum`] |
-|:--------------:|:--------------:|:----------:| ------------------ |:--------------------:|
-| 4 bytes | 2-14 bytes | n bytes | | 0-4 bytes |
-
-__`Magic_Number`__
-
-4 Bytes, __little-endian__ format.
-Value : 0xFD2FB528
-Note: This value was selected to be less probable to find at the beginning of some random file.
-It avoids trivial patterns (0x00, 0xFF, repeated bytes, increasing bytes, etc.),
-contains byte values outside of ASCII range,
-and doesn't map into UTF8 space.
-It reduces the chances that a text file represent this value by accident.
-
-__`Frame_Header`__
-
-2 to 14 Bytes, detailed in [`Frame_Header`](#frame_header).
-
-__`Data_Block`__
-
-Detailed in [`Blocks`](#blocks).
-That’s where compressed data is stored.
-
-__`Content_Checksum`__
-
-An optional 32-bit checksum, only present if `Content_Checksum_flag` is set.
-The content checksum is the result
-of [xxh64() hash function](http://www.xxhash.org)
-digesting the original (decoded) data as input, and a seed of zero.
-The low 4 bytes of the checksum are stored in __little-endian__ format.
-
-### `Frame_Header`
-
-The `Frame_Header` has a variable size, with a minimum of 2 bytes,
-and up to 14 bytes depending on optional parameters.
-The structure of `Frame_Header` is following:
-
-| `Frame_Header_Descriptor` | [`Window_Descriptor`] | [`Dictionary_ID`] | [`Frame_Content_Size`] |
-| ------------------------- | --------------------- | ----------------- | ---------------------- |
-| 1 byte | 0-1 byte | 0-4 bytes | 0-8 bytes |
-
-#### `Frame_Header_Descriptor`
-
-The first header's byte is called the `Frame_Header_Descriptor`.
-It describes which other fields are present.
-Decoding this byte is enough to tell the size of `Frame_Header`.
-
-| Bit number | Field name |
-| ---------- | ---------- |
-| 7-6 | `Frame_Content_Size_flag` |
-| 5 | `Single_Segment_flag` |
-| 4 | `Unused_bit` |
-| 3 | `Reserved_bit` |
-| 2 | `Content_Checksum_flag` |
-| 1-0 | `Dictionary_ID_flag` |
-
-In this table, bit 7 is the highest bit, while bit 0 is the lowest one.
-
-__`Frame_Content_Size_flag`__
-
-This is a 2-bits flag (`= Frame_Header_Descriptor >> 6`),
-specifying if `Frame_Content_Size` (the decompressed data size)
-is provided within the header.
-`Flag_Value` provides `FCS_Field_Size`,
-which is the number of bytes used by `Frame_Content_Size`
-according to the following table:
-
-| `Flag_Value` | 0 | 1 | 2 | 3 |
-| -------------- | ------ | --- | --- | --- |
-|`FCS_Field_Size`| 0 or 1 | 2 | 4 | 8 |
-
-When `Flag_Value` is `0`, `FCS_Field_Size` depends on `Single_Segment_flag` :
-if `Single_Segment_flag` is set, `FCS_Field_Size` is 1.
-Otherwise, `FCS_Field_Size` is 0 : `Frame_Content_Size` is not provided.
-
-__`Single_Segment_flag`__
-
-If this flag is set,
-data must be regenerated within a single continuous memory segment.
-
-In this case, `Window_Descriptor` byte is skipped,
-but `Frame_Content_Size` is necessarily present.
-As a consequence, the decoder must allocate a memory segment
-of size equal or larger than `Frame_Content_Size`.
-
-In order to preserve the decoder from unreasonable memory requirements,
-a decoder is allowed to reject a compressed frame
-which requests a memory size beyond decoder's authorized range.
-
-For broader compatibility, decoders are recommended to support
-memory sizes of at least 8 MB.
-This is only a recommendation,
-each decoder is free to support higher or lower limits,
-depending on local limitations.
-
-__`Unused_bit`__
-
-A decoder compliant with this specification version shall not interpret this bit.
-It might be used in any future version,
-to signal a property which is transparent to properly decode the frame.
-An encoder compliant with this specification version must set this bit to zero.
-
-__`Reserved_bit`__
-
-This bit is reserved for some future feature.
-Its value _must be zero_.
-A decoder compliant with this specification version must ensure it is not set.
-This bit may be used in a future revision,
-to signal a feature that must be interpreted to decode the frame correctly.
-
-__`Content_Checksum_flag`__
-
-If this flag is set, a 32-bits `Content_Checksum` will be present at frame's end.
-See `Content_Checksum` paragraph.
-
-__`Dictionary_ID_flag`__
-
-This is a 2-bits flag (`= FHD & 3`),
-telling if a dictionary ID is provided within the header.
-It also specifies the size of this field as `DID_Field_Size`.
-
-|`Flag_Value` | 0 | 1 | 2 | 3 |
-| -------------- | --- | --- | --- | --- |
-|`DID_Field_Size`| 0 | 1 | 2 | 4 |
-
-#### `Window_Descriptor`
-
-Provides guarantees on minimum memory buffer required to decompress a frame.
-This information is important for decoders to allocate enough memory.
-
-The `Window_Descriptor` byte is optional.
-When `Single_Segment_flag` is set, `Window_Descriptor` is not present.
-In this case, `Window_Size` is `Frame_Content_Size`,
-which can be any value from 0 to 2^64-1 bytes (16 ExaBytes).
-
-| Bit numbers | 7-3 | 2-0 |
-| ----------- | ---------- | ---------- |
-| Field name | `Exponent` | `Mantissa` |
-
-The minimum memory buffer size is called `Window_Size`.
-It is described by the following formulas :
-```
-windowLog = 10 + Exponent;
-windowBase = 1 << windowLog;
-windowAdd = (windowBase / 8) * Mantissa;
-Window_Size = windowBase + windowAdd;
-```
-The minimum `Window_Size` is 1 KB.
-The maximum `Window_Size` is `(1<<41) + 7*(1<<38)` bytes, which is 3.75 TB.
-
-In general, larger `Window_Size` tend to improve compression ratio,
-but at the cost of memory usage.
-
-To properly decode compressed data,
-a decoder will need to allocate a buffer of at least `Window_Size` bytes.
-
-In order to preserve decoder from unreasonable memory requirements,
-a decoder is allowed to reject a compressed frame
-which requests a memory size beyond decoder's authorized range.
-
-For improved interoperability,
-it's recommended for decoders to support `Window_Size` of up to 8 MB,
-and it's recommended for encoders to not generate frame requiring `Window_Size` larger than 8 MB.
-It's merely a recommendation though,
-decoders are free to support larger or lower limits,
-depending on local limitations.
-
-#### `Dictionary_ID`
-
-This is a variable size field, which contains
-the ID of the dictionary required to properly decode the frame.
-`Dictionary_ID` field is optional. When it's not present,
-it's up to the decoder to know which dictionary to use.
-
-`Dictionary_ID` field size is provided by `DID_Field_Size`.
-`DID_Field_Size` is directly derived from value of `Dictionary_ID_flag`.
-1 byte can represent an ID 0-255.
-2 bytes can represent an ID 0-65535.
-4 bytes can represent an ID 0-4294967295.
-Format is __little-endian__.
-
-It's allowed to represent a small ID (for example `13`)
-with a large 4-bytes dictionary ID, even if it is less efficient.
-
-A value of `0` has same meaning as no `Dictionary_ID`,
-in which case the frame may or may not need a dictionary to be decoded,
-and the ID of such a dictionary is not specified.
-The decoder must know this information by other means.
-
-#### `Frame_Content_Size`
-
-This is the original (uncompressed) size. This information is optional.
-`Frame_Content_Size` uses a variable number of bytes, provided by `FCS_Field_Size`.
-`FCS_Field_Size` is provided by the value of `Frame_Content_Size_flag`.
-`FCS_Field_Size` can be equal to 0 (not present), 1, 2, 4 or 8 bytes.
-
-| `FCS_Field_Size` | Range |
-| ---------------- | ---------- |
-| 0 | unknown |
-| 1 | 0 - 255 |
-| 2 | 256 - 65791|
-| 4 | 0 - 2^32-1 |
-| 8 | 0 - 2^64-1 |
-
-`Frame_Content_Size` format is __little-endian__.
-When `FCS_Field_Size` is 1, 4 or 8 bytes, the value is read directly.
-When `FCS_Field_Size` is 2, _the offset of 256 is added_.
-It's allowed to represent a small size (for example `18`) using any compatible variant.
-
-
-Blocks
--------
-
-After `Magic_Number` and `Frame_Header`, there are some number of blocks.
-Each frame must have at least one block,
-but there is no upper limit on the number of blocks per frame.
-
-The structure of a block is as follows:
-
-| `Block_Header` | `Block_Content` |
-|:--------------:|:---------------:|
-| 3 bytes | n bytes |
-
-__`Block_Header`__
-
-`Block_Header` uses 3 bytes, written using __little-endian__ convention.
-It contains 3 fields :
-
-| `Last_Block` | `Block_Type` | `Block_Size` |
-|:------------:|:------------:|:------------:|
-| bit 0 | bits 1-2 | bits 3-23 |
-
-__`Last_Block`__
-
-The lowest bit signals if this block is the last one.
-The frame will end after this last block.
-It may be followed by an optional `Content_Checksum`
-(see [Zstandard Frames](#zstandard-frames)).
-
-__`Block_Type`__
-
-The next 2 bits represent the `Block_Type`.
-`Block_Type` influences the meaning of `Block_Size`.
-There are 4 block types :
-
-| Value | 0 | 1 | 2 | 3 |
-| ------------ | ----------- | ----------- | ------------------ | --------- |
-| `Block_Type` | `Raw_Block` | `RLE_Block` | `Compressed_Block` | `Reserved`|
-
-- `Raw_Block` - this is an uncompressed block.
- `Block_Content` contains `Block_Size` bytes.
-
-- `RLE_Block` - this is a single byte, repeated `Block_Size` times.
- `Block_Content` consists of a single byte.
- On the decompression side, this byte must be repeated `Block_Size` times.
-
-- `Compressed_Block` - this is a [Zstandard compressed block](#compressed-blocks),
- explained later on.
- `Block_Size` is the length of `Block_Content`, the compressed data.
- The decompressed size is not known,
- but its maximum possible value is guaranteed (see below)
-
-- `Reserved` - this is not a block.
- This value cannot be used with current version of this specification.
- If such a value is present, it is considered corrupted data.
-
-__`Block_Size`__
-
-The upper 21 bits of `Block_Header` represent the `Block_Size`.
-
-When `Block_Type` is `Compressed_Block` or `Raw_Block`,
-`Block_Size` is the size of `Block_Content` (hence excluding `Block_Header`).
-
-When `Block_Type` is `RLE_Block`, since `Block_Content`’s size is always 1,
-`Block_Size` represents the number of times this byte must be repeated.
-
-`Block_Size` is limited by `Block_Maximum_Size` (see below).
-
-__`Block_Content`__ and __`Block_Maximum_Size`__
-
-The size of `Block_Content` is limited by `Block_Maximum_Size`,
-which is the smallest of:
-- `Window_Size`
-- 128 KB
-
-`Block_Maximum_Size` is constant for a given frame.
-This maximum is applicable to both the decompressed size
-and the compressed size of any block in the frame.
-
-The reasoning for this limit is that a decoder can read this information
-at the beginning of a frame and use it to allocate buffers.
-The guarantees on the size of blocks ensure that
-the buffers will be large enough for any following block of the valid frame.
-
-
-Compressed Blocks
------------------
-To decompress a compressed block, the compressed size must be provided
-from `Block_Size` field within `Block_Header`.
-
-A compressed block consists of 2 sections :
-- [Literals Section](#literals-section)
-- [Sequences Section](#sequences-section)
-
-The results of the two sections are then combined to produce the decompressed
-data in [Sequence Execution](#sequence-execution)
-
-#### Prerequisites
-To decode a compressed block, the following elements are necessary :
-- Previous decoded data, up to a distance of `Window_Size`,
- or beginning of the Frame, whichever is smaller.
-- List of "recent offsets" from previous `Compressed_Block`.
-- The previous Huffman tree, required by `Treeless_Literals_Block` type
-- Previous FSE decoding tables, required by `Repeat_Mode`
- for each symbol type (literals lengths, match lengths, offsets)
-
-Note that decoding tables aren't always from the previous `Compressed_Block`.
-
-- Every decoding table can come from a dictionary.
-- The Huffman tree comes from the previous `Compressed_Literals_Block`.
-
-Literals Section
-----------------
-All literals are regrouped in the first part of the block.
-They can be decoded first, and then copied during [Sequence Execution],
-or they can be decoded on the flow during [Sequence Execution].
-
-Literals can be stored uncompressed or compressed using Huffman prefix codes.
-When compressed, an optional tree description can be present,
-followed by 1 or 4 streams.
-
-| `Literals_Section_Header` | [`Huffman_Tree_Description`] | [jumpTable] | Stream1 | [Stream2] | [Stream3] | [Stream4] |
-| ------------------------- | ---------------------------- | ----------- | ------- | --------- | --------- | --------- |
-
-
-### `Literals_Section_Header`
-
-Header is in charge of describing how literals are packed.
-It's a byte-aligned variable-size bitfield, ranging from 1 to 5 bytes,
-using __little-endian__ convention.
-
-| `Literals_Block_Type` | `Size_Format` | `Regenerated_Size` | [`Compressed_Size`] |
-| --------------------- | ------------- | ------------------ | ------------------- |
-| 2 bits | 1 - 2 bits | 5 - 20 bits | 0 - 18 bits |
-
-In this representation, bits on the left are the lowest bits.
-
-__`Literals_Block_Type`__
-
-This field uses 2 lowest bits of first byte, describing 4 different block types :
-
-| `Literals_Block_Type` | Value |
-| --------------------------- | ----- |
-| `Raw_Literals_Block` | 0 |
-| `RLE_Literals_Block` | 1 |
-| `Compressed_Literals_Block` | 2 |
-| `Treeless_Literals_Block` | 3 |
-
-- `Raw_Literals_Block` - Literals are stored uncompressed.
-- `RLE_Literals_Block` - Literals consist of a single byte value
- repeated `Regenerated_Size` times.
-- `Compressed_Literals_Block` - This is a standard Huffman-compressed block,
- starting with a Huffman tree description.
- See details below.
-- `Treeless_Literals_Block` - This is a Huffman-compressed block,
- using Huffman tree _from previous Huffman-compressed literals block_.
- `Huffman_Tree_Description` will be skipped.
- Note: If this mode is triggered without any previous Huffman-table in the frame
- (or [dictionary](#dictionary-format)), this should be treated as data corruption.
-
-__`Size_Format`__
-
-`Size_Format` is divided into 2 families :
-
-- For `Raw_Literals_Block` and `RLE_Literals_Block`,
- it's only necessary to decode `Regenerated_Size`.
- There is no `Compressed_Size` field.
-- For `Compressed_Block` and `Treeless_Literals_Block`,
- it's required to decode both `Compressed_Size`
- and `Regenerated_Size` (the decompressed size).
- It's also necessary to decode the number of streams (1 or 4).
-
-For values spanning several bytes, convention is __little-endian__.
-
-__`Size_Format` for `Raw_Literals_Block` and `RLE_Literals_Block`__ :
-
-`Size_Format` uses 1 _or_ 2 bits.
-Its value is : `Size_Format = (Literals_Section_Header[0]>>2) & 3`
-
-- `Size_Format` == 00 or 10 : `Size_Format` uses 1 bit.
- `Regenerated_Size` uses 5 bits (0-31).
- `Literals_Section_Header` uses 1 byte.
- `Regenerated_Size = Literals_Section_Header[0]>>3`
-- `Size_Format` == 01 : `Size_Format` uses 2 bits.
- `Regenerated_Size` uses 12 bits (0-4095).
- `Literals_Section_Header` uses 2 bytes.
- `Regenerated_Size = (Literals_Section_Header[0]>>4) + (Literals_Section_Header[1]<<4)`
-- `Size_Format` == 11 : `Size_Format` uses 2 bits.
- `Regenerated_Size` uses 20 bits (0-1048575).
- `Literals_Section_Header` uses 3 bytes.
- `Regenerated_Size = (Literals_Section_Header[0]>>4) + (Literals_Section_Header[1]<<4) + (Literals_Section_Header[2]<<12)`
-
-Only Stream1 is present for these cases.
-Note : it's allowed to represent a short value (for example `13`)
-using a long format, even if it's less efficient.
-
-__`Size_Format` for `Compressed_Literals_Block` and `Treeless_Literals_Block`__ :
-
-`Size_Format` always uses 2 bits.
-
-- `Size_Format` == 00 : _A single stream_.
- Both `Regenerated_Size` and `Compressed_Size` use 10 bits (0-1023).
- `Literals_Section_Header` uses 3 bytes.
-- `Size_Format` == 01 : 4 streams.
- Both `Regenerated_Size` and `Compressed_Size` use 10 bits (0-1023).
- `Literals_Section_Header` uses 3 bytes.
-- `Size_Format` == 10 : 4 streams.
- Both `Regenerated_Size` and `Compressed_Size` use 14 bits (0-16383).
- `Literals_Section_Header` uses 4 bytes.
-- `Size_Format` == 11 : 4 streams.
- Both `Regenerated_Size` and `Compressed_Size` use 18 bits (0-262143).
- `Literals_Section_Header` uses 5 bytes.
-
-Both `Compressed_Size` and `Regenerated_Size` fields follow __little-endian__ convention.
-Note: `Compressed_Size` __includes__ the size of the Huffman Tree description
-_when_ it is present.
-
-#### Raw Literals Block
-The data in Stream1 is `Regenerated_Size` bytes long,
-it contains the raw literals data to be used during [Sequence Execution].
-
-#### RLE Literals Block
-Stream1 consists of a single byte which should be repeated `Regenerated_Size` times
-to generate the decoded literals.
-
-#### Compressed Literals Block and Treeless Literals Block
-Both of these modes contain Huffman encoded data.
-
-For `Treeless_Literals_Block`,
-the Huffman table comes from previously compressed literals block,
-or from a dictionary.
-
-
-### `Huffman_Tree_Description`
-This section is only present when `Literals_Block_Type` type is `Compressed_Literals_Block` (`2`).
-The format of the Huffman tree description can be found at [Huffman Tree description](#huffman-tree-description).
-The size of `Huffman_Tree_Description` is determined during decoding process,
-it must be used to determine where streams begin.
-`Total_Streams_Size = Compressed_Size - Huffman_Tree_Description_Size`.
-
-
-### Jump Table
-The Jump Table is only present when there are 4 Huffman-coded streams.
-
-Reminder : Huffman compressed data consists of either 1 or 4 Huffman-coded streams.
-
-If only one stream is present, it is a single bitstream occupying the entire
-remaining portion of the literals block, encoded as described within
-[Huffman-Coded Streams](#huffman-coded-streams).
-
-If there are four streams, `Literals_Section_Header` only provided
-enough information to know the decompressed and compressed sizes
-of all four streams _combined_.
-The decompressed size of _each_ stream is equal to `(Regenerated_Size+3)/4`,
-except for the last stream which may be up to 3 bytes smaller,
-to reach a total decompressed size as specified in `Regenerated_Size`.
-
-The compressed size of each stream is provided explicitly in the Jump Table.
-Jump Table is 6 bytes long, and consist of three 2-byte __little-endian__ fields,
-describing the compressed sizes of the first three streams.
-`Stream4_Size` is computed from total `Total_Streams_Size` minus sizes of other streams.
-
-`Stream4_Size = Total_Streams_Size - 6 - Stream1_Size - Stream2_Size - Stream3_Size`.
-
-Note: if `Stream1_Size + Stream2_Size + Stream3_Size > Total_Streams_Size`,
-data is considered corrupted.
-
-Each of these 4 bitstreams is then decoded independently as a Huffman-Coded stream,
-as described at [Huffman-Coded Streams](#huffman-coded-streams)
-
-
-Sequences Section
------------------
-A compressed block is a succession of _sequences_ .
-A sequence is a literal copy command, followed by a match copy command.
-A literal copy command specifies a length.
-It is the number of bytes to be copied (or extracted) from the Literals Section.
-A match copy command specifies an offset and a length.
-
-When all _sequences_ are decoded,
-if there are literals left in the _literals section_,
-these bytes are added at the end of the block.
-
-This is described in more detail in [Sequence Execution](#sequence-execution).
-
-The `Sequences_Section` regroup all symbols required to decode commands.
-There are 3 symbol types : literals lengths, offsets and match lengths.
-They are encoded together, interleaved, in a single _bitstream_.
-
-The `Sequences_Section` starts by a header,
-followed by optional probability tables for each symbol type,
-followed by the bitstream.
-
-| `Sequences_Section_Header` | [`Literals_Length_Table`] | [`Offset_Table`] | [`Match_Length_Table`] | bitStream |
-| -------------------------- | ------------------------- | ---------------- | ---------------------- | --------- |
-
-To decode the `Sequences_Section`, it's required to know its size.
-Its size is deduced from the size of `Literals_Section`:
-`Sequences_Section_Size = Block_Size - Literals_Section_Size`.
-
-
-#### `Sequences_Section_Header`
-
-Consists of 2 items:
-- `Number_of_Sequences`
-- Symbol compression modes
-
-__`Number_of_Sequences`__
-
-This is a variable size field using between 1 and 3 bytes.
-Let's call its first byte `byte0`.
-- `if (byte0 == 0)` : there are no sequences.
- The sequence section stops there.
- Decompressed content is defined entirely as Literals Section content.
- The FSE tables used in `Repeat_Mode` aren't updated.
-- `if (byte0 < 128)` : `Number_of_Sequences = byte0` . Uses 1 byte.
-- `if (byte0 < 255)` : `Number_of_Sequences = ((byte0-128) << 8) + byte1` . Uses 2 bytes.
-- `if (byte0 == 255)`: `Number_of_Sequences = byte1 + (byte2<<8) + 0x7F00` . Uses 3 bytes.
-
-__Symbol compression modes__
-
-This is a single byte, defining the compression mode of each symbol type.
-
-|Bit number| 7-6 | 5-4 | 3-2 | 1-0 |
-| -------- | ----------------------- | -------------- | -------------------- | ---------- |
-|Field name| `Literals_Lengths_Mode` | `Offsets_Mode` | `Match_Lengths_Mode` | `Reserved` |
-
-The last field, `Reserved`, must be all-zeroes.
-
-`Literals_Lengths_Mode`, `Offsets_Mode` and `Match_Lengths_Mode` define the `Compression_Mode` of
-literals lengths, offsets, and match lengths symbols respectively.
-
-They follow the same enumeration :
-
-| Value | 0 | 1 | 2 | 3 |
-| ------------------ | ----------------- | ---------- | --------------------- | ------------- |
-| `Compression_Mode` | `Predefined_Mode` | `RLE_Mode` | `FSE_Compressed_Mode` | `Repeat_Mode` |
-
-- `Predefined_Mode` : A predefined FSE distribution table is used, defined in
- [default distributions](#default-distributions).
- No distribution table will be present.
-- `RLE_Mode` : The table description consists of a single byte, which contains the symbol's value.
- This symbol will be used for all sequences.
-- `FSE_Compressed_Mode` : standard FSE compression.
- A distribution table will be present.
- The format of this distribution table is described in [FSE Table Description](#fse-table-description).
- Note that the maximum allowed accuracy log for literals length and match length tables is 9,
- and the maximum accuracy log for the offsets table is 8.
- `FSE_Compressed_Mode` must not be used when only one symbol is present,
- `RLE_Mode` should be used instead (although any other mode will work).
-- `Repeat_Mode` : The table used in the previous `Compressed_Block` with `Number_of_Sequences > 0` will be used again,
- or if this is the first block, table in the dictionary will be used.
- Note that this includes `RLE_mode`, so if `Repeat_Mode` follows `RLE_Mode`, the same symbol will be repeated.
- It also includes `Predefined_Mode`, in which case `Repeat_Mode` will have same outcome as `Predefined_Mode`.
- No distribution table will be present.
- If this mode is used without any previous sequence table in the frame
- (nor [dictionary](#dictionary-format)) to repeat, this should be treated as corruption.
-
-#### The codes for literals lengths, match lengths, and offsets.
-
-Each symbol is a _code_ in its own context,
-which specifies `Baseline` and `Number_of_Bits` to add.
-_Codes_ are FSE compressed,
-and interleaved with raw additional bits in the same bitstream.
-
-##### Literals length codes
-
-Literals length codes are values ranging from `0` to `35` included.
-They define lengths from 0 to 131071 bytes.
-The literals length is equal to the decoded `Baseline` plus
-the result of reading `Number_of_Bits` bits from the bitstream,
-as a __little-endian__ value.
-
-| `Literals_Length_Code` | 0-15 |
-| ---------------------- | ---------------------- |
-| length | `Literals_Length_Code` |
-| `Number_of_Bits` | 0 |
-
-| `Literals_Length_Code` | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
-| ---------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| `Baseline` | 16 | 18 | 20 | 22 | 24 | 28 | 32 | 40 |
-| `Number_of_Bits` | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 3 |
-
-| `Literals_Length_Code` | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
-| ---------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| `Baseline` | 48 | 64 | 128 | 256 | 512 | 1024 | 2048 | 4096 |
-| `Number_of_Bits` | 4 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
-
-| `Literals_Length_Code` | 32 | 33 | 34 | 35 |
-| ---------------------- | ---- | ---- | ---- | ---- |
-| `Baseline` | 8192 |16384 |32768 |65536 |
-| `Number_of_Bits` | 13 | 14 | 15 | 16 |
-
-
-##### Match length codes
-
-Match length codes are values ranging from `0` to `52` included.
-They define lengths from 3 to 131074 bytes.
-The match length is equal to the decoded `Baseline` plus
-the result of reading `Number_of_Bits` bits from the bitstream,
-as a __little-endian__ value.
-
-| `Match_Length_Code` | 0-31 |
-| ------------------- | ----------------------- |
-| value | `Match_Length_Code` + 3 |
-| `Number_of_Bits` | 0 |
-
-| `Match_Length_Code` | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
-| ------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| `Baseline` | 35 | 37 | 39 | 41 | 43 | 47 | 51 | 59 |
-| `Number_of_Bits` | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 3 |
-
-| `Match_Length_Code` | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
-| ------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| `Baseline` | 67 | 83 | 99 | 131 | 259 | 515 | 1027 | 2051 |
-| `Number_of_Bits` | 4 | 4 | 5 | 7 | 8 | 9 | 10 | 11 |
-
-| `Match_Length_Code` | 48 | 49 | 50 | 51 | 52 |
-| ------------------- | ---- | ---- | ---- | ---- | ---- |
-| `Baseline` | 4099 | 8195 |16387 |32771 |65539 |
-| `Number_of_Bits` | 12 | 13 | 14 | 15 | 16 |
-
-##### Offset codes
-
-Offset codes are values ranging from `0` to `N`.
-
-A decoder is free to limit its maximum `N` supported.
-Recommendation is to support at least up to `22`.
-For information, at the time of this writing.
-the reference decoder supports a maximum `N` value of `31`.
-
-An offset code is also the number of additional bits to read in __little-endian__ fashion,
-and can be translated into an `Offset_Value` using the following formulas :
-
-```
-Offset_Value = (1 << offsetCode) + readNBits(offsetCode);
-if (Offset_Value > 3) offset = Offset_Value - 3;
-```
-It means that maximum `Offset_Value` is `(2^(N+1))-1`
-supporting back-reference distances up to `(2^(N+1))-4`,
-but is limited by [maximum back-reference distance](#window_descriptor).
-
-`Offset_Value` from 1 to 3 are special : they define "repeat codes".
-This is described in more detail in [Repeat Offsets](#repeat-offsets).
-
-#### Decoding Sequences
-FSE bitstreams are read in reverse direction than written. In zstd,
-the compressor writes bits forward into a block and the decompressor
-must read the bitstream _backwards_.
-
-To find the start of the bitstream it is therefore necessary to
-know the offset of the last byte of the block which can be found
-by counting `Block_Size` bytes after the block header.
-
-After writing the last bit containing information, the compressor
-writes a single `1`-bit and then fills the byte with 0-7 `0` bits of
-padding. The last byte of the compressed bitstream cannot be `0` for
-that reason.
-
-When decompressing, the last byte containing the padding is the first
-byte to read. The decompressor needs to skip 0-7 initial `0`-bits and
-the first `1`-bit it occurs. Afterwards, the useful part of the bitstream
-begins.
-
-FSE decoding requires a 'state' to be carried from symbol to symbol.
-For more explanation on FSE decoding, see the [FSE section](#fse).
-
-For sequence decoding, a separate state keeps track of each
-literal lengths, offsets, and match lengths symbols.
-Some FSE primitives are also used.
-For more details on the operation of these primitives, see the [FSE section](#fse).
-
-##### Starting states
-The bitstream starts with initial FSE state values,
-each using the required number of bits in their respective _accuracy_,
-decoded previously from their normalized distribution.
-
-It starts by `Literals_Length_State`,
-followed by `Offset_State`,
-and finally `Match_Length_State`.
-
-Reminder : always keep in mind that all values are read _backward_,
-so the 'start' of the bitstream is at the highest position in memory,
-immediately before the last `1`-bit for padding.
-
-After decoding the starting states, a single sequence is decoded
-`Number_Of_Sequences` times.
-These sequences are decoded in order from first to last.
-Since the compressor writes the bitstream in the forward direction,
-this means the compressor must encode the sequences starting with the last
-one and ending with the first.
-
-##### Decoding a sequence
-For each of the symbol types, the FSE state can be used to determine the appropriate code.
-The code then defines the `Baseline` and `Number_of_Bits` to read for each type.
-See the [description of the codes] for how to determine these values.
-
-[description of the codes]: #the-codes-for-literals-lengths-match-lengths-and-offsets
-
-Decoding starts by reading the `Number_of_Bits` required to decode `Offset`.
-It then does the same for `Match_Length`, and then for `Literals_Length`.
-This sequence is then used for [sequence execution](#sequence-execution).
-
-If it is not the last sequence in the block,
-the next operation is to update states.
-Using the rules pre-calculated in the decoding tables,
-`Literals_Length_State` is updated,
-followed by `Match_Length_State`,
-and then `Offset_State`.
-See the [FSE section](#fse) for details on how to update states from the bitstream.
-
-This operation will be repeated `Number_of_Sequences` times.
-At the end, the bitstream shall be entirely consumed,
-otherwise the bitstream is considered corrupted.
-
-#### Default Distributions
-If `Predefined_Mode` is selected for a symbol type,
-its FSE decoding table is generated from a predefined distribution table defined here.
-For details on how to convert this distribution into a decoding table, see the [FSE section].
-
-[FSE section]: #from-normalized-distribution-to-decoding-tables
-
-##### Literals Length
-The decoding table uses an accuracy log of 6 bits (64 states).
-```
-short literalsLength_defaultDistribution[36] =
- { 4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1,
- 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 1,
- -1,-1,-1,-1 };
-```
-
-##### Match Length
-The decoding table uses an accuracy log of 6 bits (64 states).
-```
-short matchLengths_defaultDistribution[53] =
- { 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,
- -1,-1,-1,-1,-1 };
-```
-
-##### Offset Codes
-The decoding table uses an accuracy log of 5 bits (32 states),
-and supports a maximum `N` value of 28, allowing offset values up to 536,870,908 .
-
-If any sequence in the compressed block requires a larger offset than this,
-it's not possible to use the default distribution to represent it.
-```
-short offsetCodes_defaultDistribution[29] =
- { 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
- 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,-1,-1,-1 };
-```
-
-
-Sequence Execution
-------------------
-Once literals and sequences have been decoded,
-they are combined to produce the decoded content of a block.
-
-Each sequence consists of a tuple of (`literals_length`, `offset_value`, `match_length`),
-decoded as described in the [Sequences Section](#sequences-section).
-To execute a sequence, first copy `literals_length` bytes
-from the decoded literals to the output.
-
-Then `match_length` bytes are copied from previous decoded data.
-The offset to copy from is determined by `offset_value`:
-if `offset_value > 3`, then the offset is `offset_value - 3`.
-If `offset_value` is from 1-3, the offset is a special repeat offset value.
-See the [repeat offset](#repeat-offsets) section for how the offset is determined
-in this case.
-
-The offset is defined as from the current position, so an offset of 6
-and a match length of 3 means that 3 bytes should be copied from 6 bytes back.
-Note that all offsets leading to previously decoded data
-must be smaller than `Window_Size` defined in `Frame_Header_Descriptor`.
-
-#### Repeat offsets
-As seen in [Sequence Execution](#sequence-execution),
-the first 3 values define a repeated offset and we will call them
-`Repeated_Offset1`, `Repeated_Offset2`, and `Repeated_Offset3`.
-They are sorted in recency order, with `Repeated_Offset1` meaning "most recent one".
-
-If `offset_value == 1`, then the offset used is `Repeated_Offset1`, etc.
-
-There is an exception though, when current sequence's `literals_length = 0`.
-In this case, repeated offsets are shifted by one,
-so an `offset_value` of 1 means `Repeated_Offset2`,
-an `offset_value` of 2 means `Repeated_Offset3`,
-and an `offset_value` of 3 means `Repeated_Offset1 - 1_byte`.
-
-For the first block, the starting offset history is populated with following values :
-`Repeated_Offset1`=1, `Repeated_Offset2`=4, `Repeated_Offset3`=8,
-unless a dictionary is used, in which case they come from the dictionary.
-
-Then each block gets its starting offset history from the ending values of the most recent `Compressed_Block`.
-Note that blocks which are not `Compressed_Block` are skipped, they do not contribute to offset history.
-
-[Offset Codes]: #offset-codes
-
-###### Offset updates rules
-
-During the execution of the sequences of a `Compressed_Block`, the
-`Repeated_Offsets`' values are kept up to date, so that they always represent
-the three most-recently used offsets. In order to achieve that, they are
-updated after executing each sequence in the following way:
-
-When the sequence's `offset_value` does not refer to one of the
-`Repeated_Offsets`--when it has value greater than 3, or when it has value 3
-and the sequence's `literals_length` is zero--the `Repeated_Offsets`' values
-are shifted back one, and `Repeated_Offset1` takes on the value of the
-just-used offset.
-
-Otherwise, when the sequence's `offset_value` refers to one of the
-`Repeated_Offsets`--when it has value 1 or 2, or when it has value 3 and the
-sequence's `literals_length` is non-zero--the `Repeated_Offsets` are re-ordered
-so that `Repeated_Offset1` takes on the value of the used Repeated_Offset, and
-the existing values are pushed back from the first `Repeated_Offset` through to
-the `Repeated_Offset` selected by the `offset_value`. This effectively performs
-a single-stepped wrapping rotation of the values of these offsets, so that
-their order again reflects the recency of their use.
-
-The following table shows the values of the `Repeated_Offsets` as a series of
-sequences are applied to them:
-
-| `offset_value` | `literals_length` | `Repeated_Offset1` | `Repeated_Offset2` | `Repeated_Offset3` | Comment |
-|:--------------:|:-----------------:|:------------------:|:------------------:|:------------------:|:-----------------------:|
-| | | 1 | 4 | 8 | starting values |
-| 1114 | 11 | 1111 | 1 | 4 | non-repeat |
-| 1 | 22 | 1111 | 1 | 4 | repeat 1; no change |
-| 2225 | 22 | 2222 | 1111 | 1 | non-repeat |
-| 1114 | 111 | 1111 | 2222 | 1111 | non-repeat |
-| 3336 | 33 | 3333 | 1111 | 2222 | non-repeat |
-| 2 | 22 | 1111 | 3333 | 2222 | repeat 2; swap 1 & 2 |
-| 3 | 33 | 2222 | 1111 | 3333 | repeat 3; rotate 3 to 1 |
-| 3 | 0 | 2221 | 2222 | 1111 | insert resolved offset |
-| 1 | 0 | 2222 | 2221 | 3333 | repeat 2 |
-
-
-Skippable Frames
-----------------
-
-| `Magic_Number` | `Frame_Size` | `User_Data` |
-|:--------------:|:------------:|:-----------:|
-| 4 bytes | 4 bytes | n bytes |
-
-Skippable frames allow the insertion of user-defined metadata
-into a flow of concatenated frames.
-
-Skippable frames defined in this specification are compatible with [LZ4] ones.
-
-[LZ4]:http://www.lz4.org
-
-From a compliant decoder perspective, skippable frames need just be skipped,
-and their content ignored, resuming decoding after the skippable frame.
-
-It can be noted that a skippable frame
-can be used to watermark a stream of concatenated frames
-embedding any kind of tracking information (even just an UUID).
-Users wary of such possibility should scan the stream of concatenated frames
-in an attempt to detect such frame for analysis or removal.
-
-__`Magic_Number`__
-
-4 Bytes, __little-endian__ format.
-Value : 0x184D2A5?, which means any value from 0x184D2A50 to 0x184D2A5F.
-All 16 values are valid to identify a skippable frame.
-This specification doesn't detail any specific tagging for skippable frames.
-
-__`Frame_Size`__
-
-This is the size, in bytes, of the following `User_Data`
-(without including the magic number nor the size field itself).
-This field is represented using 4 Bytes, __little-endian__ format, unsigned 32-bits.
-This means `User_Data` can’t be bigger than (2^32-1) bytes.
-
-__`User_Data`__
-
-The `User_Data` can be anything. Data will just be skipped by the decoder.
-
-
-
-Entropy Encoding
-----------------
-Two types of entropy encoding are used by the Zstandard format:
-FSE, and Huffman coding.
-Huffman is used to compress literals,
-while FSE is used for all other symbols
-(`Literals_Length_Code`, `Match_Length_Code`, offset codes)
-and to compress Huffman headers.
-
-
-FSE
----
-FSE, short for Finite State Entropy, is an entropy codec based on [ANS].
-FSE encoding/decoding involves a state that is carried over between symbols,
-so decoding must be done in the opposite direction as encoding.
-Therefore, all FSE bitstreams are read from end to beginning.
-Note that the order of the bits in the stream is not reversed,
-we just read the elements in the reverse order they are written.
-
-For additional details on FSE, see [Finite State Entropy].
-
-[Finite State Entropy]:https://github.com/Cyan4973/FiniteStateEntropy/
-
-FSE decoding involves a decoding table which has a power of 2 size, and contain three elements:
-`Symbol`, `Num_Bits`, and `Baseline`.
-The `log2` of the table size is its `Accuracy_Log`.
-An FSE state value represents an index in this table.
-
-To obtain the initial state value, consume `Accuracy_Log` bits from the stream as a __little-endian__ value.
-The next symbol in the stream is the `Symbol` indicated in the table for that state.
-To obtain the next state value,
-the decoder should consume `Num_Bits` bits from the stream as a __little-endian__ value and add it to `Baseline`.
-
-[ANS]: https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems
-
-### FSE Table Description
-To decode FSE streams, it is necessary to construct the decoding table.
-The Zstandard format encodes FSE table descriptions as follows:
-
-An FSE distribution table describes the probabilities of all symbols
-from `0` to the last present one (included)
-on a normalized scale of `1 << Accuracy_Log` .
-Note that there must be two or more symbols with nonzero probability.
-
-It's a bitstream which is read forward, in __little-endian__ fashion.
-It's not necessary to know bitstream exact size,
-it will be discovered and reported by the decoding process.
-
-The bitstream starts by reporting on which scale it operates.
-Let's `low4Bits` designate the lowest 4 bits of the first byte :
-`Accuracy_Log = low4bits + 5`.
-
-Then follows each symbol value, from `0` to last present one.
-The number of bits used by each field is variable.
-It depends on :
-
-- Remaining probabilities + 1 :
- __example__ :
- Presuming an `Accuracy_Log` of 8,
- and presuming 100 probabilities points have already been distributed,
- the decoder may read any value from `0` to `256 - 100 + 1 == 157` (inclusive).
- Therefore, it must read `log2sup(157) == 8` bits.
-
-- Value decoded : small values use 1 less bit :
- __example__ :
- Presuming values from 0 to 157 (inclusive) are possible,
- 255-157 = 98 values are remaining in an 8-bits field.
- They are used this way :
- first 98 values (hence from 0 to 97) use only 7 bits,
- values from 98 to 157 use 8 bits.
- This is achieved through this scheme :
-
- | Value read | Value decoded | Number of bits used |
- | ---------- | ------------- | ------------------- |
- | 0 - 97 | 0 - 97 | 7 |
- | 98 - 127 | 98 - 127 | 8 |
- | 128 - 225 | 0 - 97 | 7 |
- | 226 - 255 | 128 - 157 | 8 |
-
-Symbols probabilities are read one by one, in order.
-
-Probability is obtained from Value decoded by following formula :
-`Proba = value - 1`
-
-It means value `0` becomes negative probability `-1`.
-`-1` is a special probability, which means "less than 1".
-Its effect on distribution table is described in the [next section].
-For the purpose of calculating total allocated probability points, it counts as one.
-
-[next section]:#from-normalized-distribution-to-decoding-tables
-
-When a symbol has a __probability__ of `zero`,
-it is followed by a 2-bits repeat flag.
-This repeat flag tells how many probabilities of zeroes follow the current one.
-It provides a number ranging from 0 to 3.
-If it is a 3, another 2-bits repeat flag follows, and so on.
-
-When last symbol reaches cumulated total of `1 << Accuracy_Log`,
-decoding is complete.
-If the last symbol makes cumulated total go above `1 << Accuracy_Log`,
-distribution is considered corrupted.
-
-Then the decoder can tell how many bytes were used in this process,
-and how many symbols are present.
-The bitstream consumes a round number of bytes.
-Any remaining bit within the last byte is just unused.
-
-#### From normalized distribution to decoding tables
-
-The distribution of normalized probabilities is enough
-to create a unique decoding table.
-
-It follows the following build rule :
-
-The table has a size of `Table_Size = 1 << Accuracy_Log`.
-Each cell describes the symbol decoded,
-and instructions to get the next state (`Number_of_Bits` and `Baseline`).
-
-Symbols are scanned in their natural order for "less than 1" probabilities.
-Symbols with this probability are being attributed a single cell,
-starting from the end of the table and retreating.
-These symbols define a full state reset, reading `Accuracy_Log` bits.
-
-Then, all remaining symbols, sorted in natural order, are allocated cells.
-Starting from symbol `0` (if it exists), and table position `0`,
-each symbol gets allocated as many cells as its probability.
-Cell allocation is spread, not linear :
-each successor position follows this rule :
-
-```
-position += (tableSize>>1) + (tableSize>>3) + 3;
-position &= tableSize-1;
-```
-
-A position is skipped if already occupied by a "less than 1" probability symbol.
-`position` does not reset between symbols, it simply iterates through
-each position in the table, switching to the next symbol when enough
-states have been allocated to the current one.
-
-The process guarantees that the table is entirely filled.
-Each cell corresponds to a state value, which contains the symbol being decoded.
-
-To add the `Number_of_Bits` and `Baseline` required to retrieve next state,
-it's first necessary to sort all occurrences of each symbol in state order.
-Lower states will need 1 more bit than higher ones.
-The process is repeated for each symbol.
-
-__Example__ :
-Presuming a symbol has a probability of 5,
-it receives 5 cells, corresponding to 5 state values.
-These state values are then sorted in natural order.
-
-Next power of 2 after 5 is 8.
-Space of probabilities must be divided into 8 equal parts.
-Presuming the `Accuracy_Log` is 7, it defines a space of 128 states.
-Divided by 8, each share is 16 large.
-
-In order to reach 8 shares, 8-5=3 lowest states will count "double",
-doubling their shares (32 in width), hence requiring one more bit.
-
-Baseline is assigned starting from the higher states using fewer bits,
-increasing at each state, then resuming at the first state,
-each state takes its allocated width from Baseline.
-
-| state value | 1 | 39 | 77 | 84 | 122 |
-| state order | 0 | 1 | 2 | 3 | 4 |
-| ---------------- | ----- | ----- | ------ | ---- | ------ |
-| width | 32 | 32 | 32 | 16 | 16 |
-| `Number_of_Bits` | 5 | 5 | 5 | 4 | 4 |
-| range number | 2 | 4 | 6 | 0 | 1 |
-| `Baseline` | 32 | 64 | 96 | 0 | 16 |
-| range | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 |
-
-During decoding, the next state value is determined from current state value,
-by reading the required `Number_of_Bits`, and adding the specified `Baseline`.
-
-See [Appendix A] for the results of this process applied to the default distributions.
-
-[Appendix A]: #appendix-a---decoding-tables-for-predefined-codes
-
-
-Huffman Coding
---------------
-Zstandard Huffman-coded streams are read backwards,
-similar to the FSE bitstreams.
-Therefore, to find the start of the bitstream, it is therefore to
-know the offset of the last byte of the Huffman-coded stream.
-
-After writing the last bit containing information, the compressor
-writes a single `1`-bit and then fills the byte with 0-7 `0` bits of
-padding. The last byte of the compressed bitstream cannot be `0` for
-that reason.
-
-When decompressing, the last byte containing the padding is the first
-byte to read. The decompressor needs to skip 0-7 initial `0`-bits and
-the first `1`-bit it occurs. Afterwards, the useful part of the bitstream
-begins.
-
-The bitstream contains Huffman-coded symbols in __little-endian__ order,
-with the codes defined by the method below.
-
-### Huffman Tree Description
-
-Prefix coding represents symbols from an a priori known alphabet
-by bit sequences (codewords), one codeword for each symbol,
-in a manner such that different symbols may be represented
-by bit sequences of different lengths,
-but a parser can always parse an encoded string
-unambiguously symbol-by-symbol.
-
-Given an alphabet with known symbol frequencies,
-the Huffman algorithm allows the construction of an optimal prefix code
-using the fewest bits of any possible prefix codes for that alphabet.
-
-Prefix code must not exceed a maximum code length.
-More bits improve accuracy but cost more header size,
-and require more memory or more complex decoding operations.
-This specification limits maximum code length to 11 bits.
-
-#### Representation
-
-All literal values from zero (included) to last present one (excluded)
-are represented by `Weight` with values from `0` to `Max_Number_of_Bits`.
-Transformation from `Weight` to `Number_of_Bits` follows this formula :
-```
-Number_of_Bits = Weight ? (Max_Number_of_Bits + 1 - Weight) : 0
-```
-The last symbol's `Weight` is deduced from previously decoded ones,
-by completing to the nearest power of 2.
-This power of 2 gives `Max_Number_of_Bits`, the depth of the current tree.
-`Max_Number_of_Bits` must be <= 11,
-otherwise the representation is considered corrupted.
-
-__Example__ :
-Let's presume the following Huffman tree must be described :
-
-| literal value | 0 | 1 | 2 | 3 | 4 | 5 |
-| ---------------- | --- | --- | --- | --- | --- | --- |
-| `Number_of_Bits` | 1 | 2 | 3 | 0 | 4 | 4 |
-
-The tree depth is 4, since its longest elements uses 4 bits
-(longest elements are the one with smallest frequency).
-Value `5` will not be listed, as it can be determined from values for 0-4,
-nor will values above `5` as they are all 0.
-Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`.
-Weight formula is :
-```
-Weight = Number_of_Bits ? (Max_Number_of_Bits + 1 - Number_of_Bits) : 0
-```
-It gives the following series of weights :
-
-| literal value | 0 | 1 | 2 | 3 | 4 |
-| ------------- | --- | --- | --- | --- | --- |
-| `Weight` | 4 | 3 | 2 | 0 | 1 |
-
-The decoder will do the inverse operation :
-having collected weights of literal symbols from `0` to `4`,
-it knows the last literal, `5`, is present with a non-zero `Weight`.
-The `Weight` of `5` can be determined by advancing to the next power of 2.
-The sum of `2^(Weight-1)` (excluding 0's) is :
-`8 + 4 + 2 + 0 + 1 = 15`.
-Nearest larger power of 2 value is 16.
-Therefore, `Max_Number_of_Bits = 4` and `Weight[5] = 16-15 = 1`.
-
-#### Huffman Tree header
-
-This is a single byte value (0-255),
-which describes how the series of weights is encoded.
-
-- if `headerByte` < 128 :
- the series of weights is compressed using FSE (see below).
- The length of the FSE-compressed series is equal to `headerByte` (0-127).
-
-- if `headerByte` >= 128 :
- + the series of weights uses a direct representation,
- where each `Weight` is encoded directly as a 4 bits field (0-15).
- + They are encoded forward, 2 weights to a byte,
- first weight taking the top four bits and second one taking the bottom four.
- * e.g. the following operations could be used to read the weights:
- `Weight[0] = (Byte[0] >> 4), Weight[1] = (Byte[0] & 0xf)`, etc.
- + The full representation occupies `Ceiling(Number_of_Weights/2)` bytes,
- meaning it uses only full bytes even if `Number_of_Weights` is odd.
- + `Number_of_Weights = headerByte - 127`.
- * Note that maximum `Number_of_Weights` is 255-127 = 128,
- therefore, only up to 128 `Weight` can be encoded using direct representation.
- * Since the last non-zero `Weight` is _not_ encoded,
- this scheme is compatible with alphabet sizes of up to 129 symbols,
- hence including literal symbol 128.
- * If any literal symbol > 128 has a non-zero `Weight`,
- direct representation is not possible.
- In such case, it's necessary to use FSE compression.
-
-
-#### Finite State Entropy (FSE) compression of Huffman weights
-
-In this case, the series of Huffman weights is compressed using FSE compression.
-It's a single bitstream with 2 interleaved states,
-sharing a single distribution table.
-
-To decode an FSE bitstream, it is necessary to know its compressed size.
-Compressed size is provided by `headerByte`.
-It's also necessary to know its _maximum possible_ decompressed size,
-which is `255`, since literal values span from `0` to `255`,
-and last symbol's `Weight` is not represented.
-
-An FSE bitstream starts by a header, describing probabilities distribution.
-It will create a Decoding Table.
-For a list of Huffman weights, the maximum accuracy log is 6 bits.
-For more description see the [FSE header description](#fse-table-description)
-
-The Huffman header compression uses 2 states,
-which share the same FSE distribution table.
-The first state (`State1`) encodes the even indexed symbols,
-and the second (`State2`) encodes the odd indexed symbols.
-`State1` is initialized first, and then `State2`, and they take turns
-decoding a single symbol and updating their state.
-For more details on these FSE operations, see the [FSE section](#fse).
-
-The number of symbols to decode is determined
-by tracking bitStream overflow condition:
-If updating state after decoding a symbol would require more bits than
-remain in the stream, it is assumed that extra bits are 0. Then,
-symbols for each of the final states are decoded and the process is complete.
-
-#### Conversion from weights to Huffman prefix codes
-
-All present symbols shall now have a `Weight` value.
-It is possible to transform weights into `Number_of_Bits`, using this formula:
-```
-Number_of_Bits = (Weight>0) ? Max_Number_of_Bits + 1 - Weight : 0
-```
-Symbols are sorted by `Weight`.
-Within same `Weight`, symbols keep natural sequential order.
-Symbols with a `Weight` of zero are removed.
-Then, starting from lowest `Weight`, prefix codes are distributed in sequential order.
-
-__Example__ :
-Let's presume the following list of weights has been decoded :
-
-| Literal | 0 | 1 | 2 | 3 | 4 | 5 |
-| -------- | --- | --- | --- | --- | --- | --- |
-| `Weight` | 4 | 3 | 2 | 0 | 1 | 1 |
-
-Sorted by weight and then natural sequential order,
-it gives the following distribution :
-
-| Literal | 3 | 4 | 5 | 2 | 1 | 0 |
-| ---------------- | --- | --- | --- | --- | --- | ---- |
-| `Weight` | 0 | 1 | 1 | 2 | 3 | 4 |
-| `Number_of_Bits` | 0 | 4 | 4 | 3 | 2 | 1 |
-| prefix codes | N/A | 0000| 0001| 001 | 01 | 1 |
-
-### Huffman-coded Streams
-
-Given a Huffman decoding table,
-it's possible to decode a Huffman-coded stream.
-
-Each bitstream must be read _backward_,
-that is starting from the end down to the beginning.
-Therefore it's necessary to know the size of each bitstream.
-
-It's also necessary to know exactly which _bit_ is the last one.
-This is detected by a final bit flag :
-the highest bit of latest byte is a final-bit-flag.
-Consequently, a last byte of `0` is not possible.
-And the final-bit-flag itself is not part of the useful bitstream.
-Hence, the last byte contains between 0 and 7 useful bits.
-
-Starting from the end,
-it's possible to read the bitstream in a __little-endian__ fashion,
-keeping track of already used bits. Since the bitstream is encoded in reverse
-order, starting from the end read symbols in forward order.
-
-For example, if the literal sequence "0145" was encoded using above prefix code,
-it would be encoded (in reverse order) as:
-
-|Symbol | 5 | 4 | 1 | 0 | Padding |
-|--------|------|------|----|---|---------|
-|Encoding|`0000`|`0001`|`01`|`1`| `00001` |
-
-Resulting in following 2-bytes bitstream :
-```
-00010000 00001101
-```
-
-Here is an alternative representation with the symbol codes separated by underscore:
-```
-0001_0000 00001_1_01
-```
-
-Reading highest `Max_Number_of_Bits` bits,
-it's possible to compare extracted value to decoding table,
-determining the symbol to decode and number of bits to discard.
-
-The process continues up to reading the required number of symbols per stream.
-If a bitstream is not entirely and exactly consumed,
-hence reaching exactly its beginning position with _all_ bits consumed,
-the decoding process is considered faulty.
-
-
-Dictionary Format
------------------
-
-Zstandard is compatible with "raw content" dictionaries,
-free of any format restriction, except that they must be at least 8 bytes.
-These dictionaries function as if they were just the `Content` part
-of a formatted dictionary.
-
-But dictionaries created by `zstd --train` follow a format, described here.
-
-__Pre-requisites__ : a dictionary has a size,
- defined either by a buffer limit, or a file size.
-
-| `Magic_Number` | `Dictionary_ID` | `Entropy_Tables` | `Content` |
-| -------------- | --------------- | ---------------- | --------- |
-
-__`Magic_Number`__ : 4 bytes ID, value 0xEC30A437, __little-endian__ format
-
-__`Dictionary_ID`__ : 4 bytes, stored in __little-endian__ format.
- `Dictionary_ID` can be any value, except 0 (which means no `Dictionary_ID`).
- It's used by decoders to check if they use the correct dictionary.
-
-_Reserved ranges :_
-If the dictionary is going to be distributed in a public environment,
-the following ranges of `Dictionary_ID` are reserved for some future registrar
-and shall not be used :
-
- - low range : <= 32767
- - high range : >= (2^31)
-
-Outside of these ranges, any value of `Dictionary_ID`
-which is both `>= 32768` and `< (1<<31)` can be used freely,
-even in public environment.
-
-
-__`Entropy_Tables`__ : follow the same format as tables in [compressed blocks].
- See the relevant [FSE](#fse-table-description)
- and [Huffman](#huffman-tree-description) sections for how to decode these tables.
- They are stored in following order :
- Huffman tables for literals, FSE table for offsets,
- FSE table for match lengths, and FSE table for literals lengths.
- These tables populate the Repeat Stats literals mode and
- Repeat distribution mode for sequence decoding.
- It's finally followed by 3 offset values, populating recent offsets (instead of using `{1,4,8}`),
- stored in order, 4-bytes __little-endian__ each, for a total of 12 bytes.
- Each recent offset must have a value <= dictionary content size, and cannot equal 0.
-
-__`Content`__ : The rest of the dictionary is its content.
- The content act as a "past" in front of data to compress or decompress,
- so it can be referenced in sequence commands.
- As long as the amount of data decoded from this frame is less than or
- equal to `Window_Size`, sequence commands may specify offsets longer
- than the total length of decoded output so far to reference back to the
- dictionary, even parts of the dictionary with offsets larger than `Window_Size`.
- After the total output has surpassed `Window_Size` however,
- this is no longer allowed and the dictionary is no longer accessible.
-
-[compressed blocks]: #the-format-of-compressed_block
-
-If a dictionary is provided by an external source,
-it should be loaded with great care, its content considered untrusted.
-
-
-
-Appendix A - Decoding tables for predefined codes
--------------------------------------------------
-
-This appendix contains FSE decoding tables
-for the predefined literal length, match length, and offset codes.
-The tables have been constructed using the algorithm as given above in chapter
-"from normalized distribution to decoding tables".
-The tables here can be used as examples
-to crosscheck that an implementation build its decoding tables correctly.
-
-#### Literal Length Code:
-
-| State | Symbol | Number_Of_Bits | Base |
-| ----- | ------ | -------------- | ---- |
-| 0 | 0 | 4 | 0 |
-| 1 | 0 | 4 | 16 |
-| 2 | 1 | 5 | 32 |
-| 3 | 3 | 5 | 0 |
-| 4 | 4 | 5 | 0 |
-| 5 | 6 | 5 | 0 |
-| 6 | 7 | 5 | 0 |
-| 7 | 9 | 5 | 0 |
-| 8 | 10 | 5 | 0 |
-| 9 | 12 | 5 | 0 |
-| 10 | 14 | 6 | 0 |
-| 11 | 16 | 5 | 0 |
-| 12 | 18 | 5 | 0 |
-| 13 | 19 | 5 | 0 |
-| 14 | 21 | 5 | 0 |
-| 15 | 22 | 5 | 0 |
-| 16 | 24 | 5 | 0 |
-| 17 | 25 | 5 | 32 |
-| 18 | 26 | 5 | 0 |
-| 19 | 27 | 6 | 0 |
-| 20 | 29 | 6 | 0 |
-| 21 | 31 | 6 | 0 |
-| 22 | 0 | 4 | 32 |
-| 23 | 1 | 4 | 0 |
-| 24 | 2 | 5 | 0 |
-| 25 | 4 | 5 | 32 |
-| 26 | 5 | 5 | 0 |
-| 27 | 7 | 5 | 32 |
-| 28 | 8 | 5 | 0 |
-| 29 | 10 | 5 | 32 |
-| 30 | 11 | 5 | 0 |
-| 31 | 13 | 6 | 0 |
-| 32 | 16 | 5 | 32 |
-| 33 | 17 | 5 | 0 |
-| 34 | 19 | 5 | 32 |
-| 35 | 20 | 5 | 0 |
-| 36 | 22 | 5 | 32 |
-| 37 | 23 | 5 | 0 |
-| 38 | 25 | 4 | 0 |
-| 39 | 25 | 4 | 16 |
-| 40 | 26 | 5 | 32 |
-| 41 | 28 | 6 | 0 |
-| 42 | 30 | 6 | 0 |
-| 43 | 0 | 4 | 48 |
-| 44 | 1 | 4 | 16 |
-| 45 | 2 | 5 | 32 |
-| 46 | 3 | 5 | 32 |
-| 47 | 5 | 5 | 32 |
-| 48 | 6 | 5 | 32 |
-| 49 | 8 | 5 | 32 |
-| 50 | 9 | 5 | 32 |
-| 51 | 11 | 5 | 32 |
-| 52 | 12 | 5 | 32 |
-| 53 | 15 | 6 | 0 |
-| 54 | 17 | 5 | 32 |
-| 55 | 18 | 5 | 32 |
-| 56 | 20 | 5 | 32 |
-| 57 | 21 | 5 | 32 |
-| 58 | 23 | 5 | 32 |
-| 59 | 24 | 5 | 32 |
-| 60 | 35 | 6 | 0 |
-| 61 | 34 | 6 | 0 |
-| 62 | 33 | 6 | 0 |
-| 63 | 32 | 6 | 0 |
-
-#### Match Length Code:
-
-| State | Symbol | Number_Of_Bits | Base |
-| ----- | ------ | -------------- | ---- |
-| 0 | 0 | 6 | 0 |
-| 1 | 1 | 4 | 0 |
-| 2 | 2 | 5 | 32 |
-| 3 | 3 | 5 | 0 |
-| 4 | 5 | 5 | 0 |
-| 5 | 6 | 5 | 0 |
-| 6 | 8 | 5 | 0 |
-| 7 | 10 | 6 | 0 |
-| 8 | 13 | 6 | 0 |
-| 9 | 16 | 6 | 0 |
-| 10 | 19 | 6 | 0 |
-| 11 | 22 | 6 | 0 |
-| 12 | 25 | 6 | 0 |
-| 13 | 28 | 6 | 0 |
-| 14 | 31 | 6 | 0 |
-| 15 | 33 | 6 | 0 |
-| 16 | 35 | 6 | 0 |
-| 17 | 37 | 6 | 0 |
-| 18 | 39 | 6 | 0 |
-| 19 | 41 | 6 | 0 |
-| 20 | 43 | 6 | 0 |
-| 21 | 45 | 6 | 0 |
-| 22 | 1 | 4 | 16 |
-| 23 | 2 | 4 | 0 |
-| 24 | 3 | 5 | 32 |
-| 25 | 4 | 5 | 0 |
-| 26 | 6 | 5 | 32 |
-| 27 | 7 | 5 | 0 |
-| 28 | 9 | 6 | 0 |
-| 29 | 12 | 6 | 0 |
-| 30 | 15 | 6 | 0 |
-| 31 | 18 | 6 | 0 |
-| 32 | 21 | 6 | 0 |
-| 33 | 24 | 6 | 0 |
-| 34 | 27 | 6 | 0 |
-| 35 | 30 | 6 | 0 |
-| 36 | 32 | 6 | 0 |
-| 37 | 34 | 6 | 0 |
-| 38 | 36 | 6 | 0 |
-| 39 | 38 | 6 | 0 |
-| 40 | 40 | 6 | 0 |
-| 41 | 42 | 6 | 0 |
-| 42 | 44 | 6 | 0 |
-| 43 | 1 | 4 | 32 |
-| 44 | 1 | 4 | 48 |
-| 45 | 2 | 4 | 16 |
-| 46 | 4 | 5 | 32 |
-| 47 | 5 | 5 | 32 |
-| 48 | 7 | 5 | 32 |
-| 49 | 8 | 5 | 32 |
-| 50 | 11 | 6 | 0 |
-| 51 | 14 | 6 | 0 |
-| 52 | 17 | 6 | 0 |
-| 53 | 20 | 6 | 0 |
-| 54 | 23 | 6 | 0 |
-| 55 | 26 | 6 | 0 |
-| 56 | 29 | 6 | 0 |
-| 57 | 52 | 6 | 0 |
-| 58 | 51 | 6 | 0 |
-| 59 | 50 | 6 | 0 |
-| 60 | 49 | 6 | 0 |
-| 61 | 48 | 6 | 0 |
-| 62 | 47 | 6 | 0 |
-| 63 | 46 | 6 | 0 |
-
-#### Offset Code:
-
-| State | Symbol | Number_Of_Bits | Base |
-| ----- | ------ | -------------- | ---- |
-| 0 | 0 | 5 | 0 |
-| 1 | 6 | 4 | 0 |
-| 2 | 9 | 5 | 0 |
-| 3 | 15 | 5 | 0 |
-| 4 | 21 | 5 | 0 |
-| 5 | 3 | 5 | 0 |
-| 6 | 7 | 4 | 0 |
-| 7 | 12 | 5 | 0 |
-| 8 | 18 | 5 | 0 |
-| 9 | 23 | 5 | 0 |
-| 10 | 5 | 5 | 0 |
-| 11 | 8 | 4 | 0 |
-| 12 | 14 | 5 | 0 |
-| 13 | 20 | 5 | 0 |
-| 14 | 2 | 5 | 0 |
-| 15 | 7 | 4 | 16 |
-| 16 | 11 | 5 | 0 |
-| 17 | 17 | 5 | 0 |
-| 18 | 22 | 5 | 0 |
-| 19 | 4 | 5 | 0 |
-| 20 | 8 | 4 | 16 |
-| 21 | 13 | 5 | 0 |
-| 22 | 19 | 5 | 0 |
-| 23 | 1 | 5 | 0 |
-| 24 | 6 | 4 | 16 |
-| 25 | 10 | 5 | 0 |
-| 26 | 16 | 5 | 0 |
-| 27 | 28 | 5 | 0 |
-| 28 | 27 | 5 | 0 |
-| 29 | 26 | 5 | 0 |
-| 30 | 25 | 5 | 0 |
-| 31 | 24 | 5 | 0 |
-
-
-
-Appendix B - Resources for implementers
--------------------------------------------------
-
-An open source reference implementation is available on :
-https://github.com/facebook/zstd
-
-The project contains a frame generator, called [decodeCorpus],
-which can be used by any 3rd-party implementation
-to verify that a tested decoder is compliant with the specification.
-
-[decodeCorpus]: https://github.com/facebook/zstd/tree/v1.3.4/tests#decodecorpus---tool-to-generate-zstandard-frames-for-decoder-testing
-
-`decodeCorpus` generates random valid frames.
-A compliant decoder should be able to decode them all,
-or at least provide a meaningful error code explaining for which reason it cannot
-(memory limit restrictions for example).
-
-
-Version changes
----------------
-- 0.3.7 : clarifications for Repeat_Offsets, matching RFC8878
-- 0.3.6 : clarifications for Dictionary_ID
-- 0.3.5 : clarifications for Block_Maximum_Size
-- 0.3.4 : clarifications for FSE decoding table
-- 0.3.3 : clarifications for field Block_Size
-- 0.3.2 : remove additional block size restriction on compressed blocks
-- 0.3.1 : minor clarification regarding offset history update rules
-- 0.3.0 : minor edits to match RFC8478
-- 0.2.9 : clarifications for huffman weights direct representation, by Ulrich Kunitz
-- 0.2.8 : clarifications for IETF RFC discuss
-- 0.2.7 : clarifications from IETF RFC review, by Vijay Gurbani and Nick Terrell
-- 0.2.6 : fixed an error in huffman example, by Ulrich Kunitz
-- 0.2.5 : minor typos and clarifications
-- 0.2.4 : section restructuring, by Sean Purcell
-- 0.2.3 : clarified several details, by Sean Purcell
-- 0.2.2 : added predefined codes, by Johannes Rudolph
-- 0.2.1 : clarify field names, by Przemyslaw Skibinski
-- 0.2.0 : numerous format adjustments for zstd v0.8+
-- 0.1.2 : limit Huffman tree depth to 11 bits
-- 0.1.1 : reserved dictID ranges
-- 0.1.0 : initial release