-
Notifications
You must be signed in to change notification settings - Fork 98
Add pcodec #501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add pcodec #501
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
2bbfdaa
added PCodec
rabernat 6d2b662
fix line length and print statements
rabernat 3eb20d1
docs
rabernat efb1227
mock pcodec on rtd
rabernat a1c8d5c
fix typo
rabernat c9bfa6c
add dtype details
rabernat f999831
changed import style for pcodec
rabernat 1c44cf2
Merge remote-tracking branch 'upstream/main' into pcodec
rabernat 2650be8
fix flake8
rabernat e81004d
revert import changes
rabernat eaab355
fix errors due to changes in pcodec API
rabernat 78a665e
change import style
rabernat 6bfd88f
skip coverage of failed import path
rabernat d637773
skip pcodec tests if not installed
rabernat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Submodule c-blosc
updated
4 files
+1 −1 | .github/workflows/cmake.yml | |
+7 −0 | RELEASE_NOTES.rst | |
+2 −2 | blosc/blosc.h | |
+1 −1 | tests/test_common.h |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -67,6 +67,7 @@ Contents | |
abc | ||
registry | ||
blosc | ||
pcodec | ||
lz4 | ||
zfpy | ||
zstd | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
PCodec | ||
====== | ||
|
||
.. automodule:: numcodecs.pcodec | ||
|
||
.. autoclass:: PCodec | ||
|
||
.. autoattribute:: codec_id | ||
.. automethod:: encode | ||
.. automethod:: decode |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"delta_encoding_order": null, | ||
"equal_pages_up_to": 262144, | ||
"float_mult_spec": "enabled", | ||
"id": "pcodec", | ||
"int_mult_spec": "enabled", | ||
"level": 8 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"delta_encoding_order": null, | ||
"equal_pages_up_to": 262144, | ||
"float_mult_spec": "enabled", | ||
"id": "pcodec", | ||
"int_mult_spec": "enabled", | ||
"level": 1 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"delta_encoding_order": null, | ||
"equal_pages_up_to": 262144, | ||
"float_mult_spec": "enabled", | ||
"id": "pcodec", | ||
"int_mult_spec": "enabled", | ||
"level": 5 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"delta_encoding_order": null, | ||
"equal_pages_up_to": 262144, | ||
"float_mult_spec": "enabled", | ||
"id": "pcodec", | ||
"int_mult_spec": "enabled", | ||
"level": 9 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"delta_encoding_order": null, | ||
"equal_pages_up_to": 262144, | ||
"float_mult_spec": "disabled", | ||
"id": "pcodec", | ||
"int_mult_spec": "disabled", | ||
"level": 8 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"delta_encoding_order": null, | ||
"equal_pages_up_to": 300, | ||
"float_mult_spec": "enabled", | ||
"id": "pcodec", | ||
"int_mult_spec": "enabled", | ||
"level": 8 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
from typing import Optional, Literal | ||
|
||
import numcodecs | ||
import numcodecs.abc | ||
from numcodecs.compat import ensure_contiguous_ndarray | ||
|
||
try: | ||
from pcodec import standalone, ChunkConfig, PagingSpec | ||
except ImportError: # pragma: no cover | ||
standalone = None | ||
|
||
|
||
DEFAULT_MAX_PAGE_N = 262144 | ||
|
||
|
||
class PCodec(numcodecs.abc.Codec): | ||
""" | ||
PCodec (or pco, pronounced "pico") losslessly compresses and decompresses | ||
numerical sequences with high compression ratio and fast speed. | ||
|
||
See `PCodec Repo <https://github.com/mwlon/pcodec>`_ for more information. | ||
|
||
PCodec supports only the following numerical dtypes: uint32, unit64, int32, | ||
int64, float32, and float64. | ||
|
||
Parameters | ||
---------- | ||
level : int | ||
A compression level from 0-12, where 12 take the longest and compresses | ||
the most. | ||
delta_encoding_order : init or None | ||
Either a delta encoding level from 0-7 or None. If set to None, pcodec | ||
will try to infer the optimal delta encoding order. | ||
int_mult_spec : {'enabled', 'disabled'} | ||
If enabled, pcodec will consider using int mult mode, which can | ||
substantially improve compression ratio but decrease speed in some cases | ||
for integer types. | ||
float_mult_spec : {'enabled', 'disabled'} | ||
If enabled, pcodec will consider using float mult mode, which can | ||
substantially improve compression ratio but decrease speed in some cases | ||
for float types. | ||
equal_pages_up_to : int | ||
Divide the chunk into equal pages of up to this many numbers. | ||
""" | ||
|
||
codec_id = "pcodec" | ||
|
||
def __init__( | ||
self, | ||
level: int = 8, | ||
delta_encoding_order: Optional[int] = None, | ||
int_mult_spec: Literal["enabled", "disabled"] = "enabled", | ||
float_mult_spec: Literal["enabled", "disabled"] = "enabled", | ||
equal_pages_up_to: int = 262144 | ||
): | ||
if standalone is None: # pragma: no cover | ||
raise ImportError( | ||
"pcodec must be installed to use the PCodec codec." | ||
) | ||
|
||
# note that we use `level` instead of `compression_level` to | ||
# match other codecs | ||
self.level = level | ||
self.delta_encoding_order = delta_encoding_order | ||
self.int_mult_spec = int_mult_spec | ||
self.float_mult_spec = float_mult_spec | ||
self.equal_pages_up_to = equal_pages_up_to | ||
|
||
def encode(self, buf): | ||
buf = ensure_contiguous_ndarray(buf) | ||
|
||
paging_spec = PagingSpec.equal_pages_up_to(self.equal_pages_up_to) | ||
|
||
config = ChunkConfig( | ||
compression_level=self.level, | ||
delta_encoding_order=self.delta_encoding_order, | ||
int_mult_spec=self.int_mult_spec, | ||
float_mult_spec=self.float_mult_spec, | ||
paging_spec=paging_spec, | ||
) | ||
return standalone.simple_compress(buf, config) | ||
|
||
def decode(self, buf, out=None): | ||
if out is not None: | ||
out = ensure_contiguous_ndarray(out) | ||
standalone.simple_decompress_into(buf, out) | ||
return out | ||
else: | ||
return standalone.simple_decompress(buf) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
import pytest | ||
import numpy as np | ||
|
||
from numcodecs.pcodec import PCodec | ||
|
||
try: | ||
# initializing codec triggers ImportError | ||
PCodec() | ||
except ImportError: # pragma: no cover | ||
pytest.skip( | ||
"pcodec not available", allow_module_level=True | ||
) | ||
|
||
from numcodecs.tests.common import ( | ||
check_encode_decode_array_to_bytes, | ||
check_config, | ||
check_repr, | ||
check_backwards_compatibility, | ||
check_err_decode_object_buffer, | ||
check_err_encode_object_buffer, | ||
) | ||
|
||
|
||
codecs = [ | ||
PCodec(), | ||
PCodec(level=1), | ||
PCodec(level=5), | ||
PCodec(level=9), | ||
PCodec(float_mult_spec="disabled", int_mult_spec="disabled"), | ||
PCodec(equal_pages_up_to=300), | ||
] | ||
|
||
|
||
# mix of dtypes: integer, float | ||
# mix of shapes: 1D, 2D | ||
# mix of orders: C, F | ||
arrays = [ | ||
np.arange(1000, dtype="u4"), | ||
np.arange(1000, dtype="u8"), | ||
np.arange(1000, dtype="i4"), | ||
np.arange(1000, dtype="i8"), | ||
np.linspace(1000, 1001, 1000, dtype="f4"), | ||
np.linspace(1000, 1001, 1000, dtype="f8"), | ||
np.random.normal(loc=1000, scale=1, size=(100, 10)), | ||
np.asfortranarray(np.random.normal(loc=1000, scale=1, size=(100, 10))), | ||
np.random.randint(0, 2**60, size=1000, dtype="u8"), | ||
np.random.randint(-(2**63), -(2**63) + 20, size=1000, dtype="i8"), | ||
] | ||
|
||
|
||
@pytest.mark.parametrize("arr", arrays) | ||
@pytest.mark.parametrize("codec", codecs) | ||
def test_encode_decode(arr, codec): | ||
check_encode_decode_array_to_bytes(arr, codec) | ||
|
||
|
||
def test_config(): | ||
codec = PCodec(level=3) | ||
check_config(codec) | ||
|
||
|
||
def test_repr(): | ||
check_repr( | ||
"PCodec(delta_encoding_order=None, equal_pages_up_to=262144, float_mult_spec='enabled', " | ||
"int_mult_spec='enabled', level=3)" | ||
) | ||
|
||
|
||
def test_backwards_compatibility(): | ||
check_backwards_compatibility(PCodec.codec_id, arrays, codecs) | ||
|
||
|
||
def test_err_decode_object_buffer(): | ||
check_err_decode_object_buffer(PCodec()) | ||
|
||
|
||
def test_err_encode_object_buffer(): | ||
check_err_encode_object_buffer(PCodec()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
(This may be a good way to label array-bytes codecs; maybe also type for
buf
should bendarray
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that there is much value in adding these sorts of type hints if we are not actually running type checking on the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not for other users of the library, and their IDEs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But incorrect type hints are worse than none at all! For example, is
ndarray
really the correct type for buf? Maybe, but who knows? I could add it, but without running mypy, we'll never know for sure.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True that correctness is important, of course; but this is like the "light" version of array->array V array->bytes V bytes->bytes. Still useful. You can always get around mypy too, if you want.