bpo-31993: Do not use memoryview when pickle large strings. #5154

serhiy-storchaka · 2018-01-11T11:13:01Z

PyMemoryView_FromMemory() created a memoryview referring to the internal data of the string. When the string is destroyed the memoryview become referring to a freed memory.

https://bugs.python.org/issue31993

PyMemoryView_FromMemory() created a memoryview referring to the internal data of the string. When the string is destroyed the memoryview become referring to a freed memory.

ogrisel

Some comments. I am concerned about the high memory usage incurred by the use of PyBytes_FromStringAndSize but off-course I agree that the priority is to fix the safety issue.

ogrisel · 2018-01-11T12:45:06Z

Lib/test/pickletester.py

@@ -2179,57 +2179,60 @@ def write(self, chunk):
            def concatenate_chunks(self):
                # Some chunks can be memoryview instances, we need to convert
                # them to bytes to be able to call join


This comment should now be removed.

ogrisel · 2018-01-11T12:48:28Z

Lib/test/pickletester.py


            # Actually read the binary content of the chunks after the end
-            # of the call to dump: ant memoryview passed to write should not
+            # of the call to dump: and memoryview passed to write should not


I think this should read "any memoryview" instead.

ogrisel · 2018-01-11T12:52:03Z

Lib/test/pickletester.py

-                self.assertGreaterEqual(9, chunk_size)
+                self.assertLess(chunk_size, 2 * self.FRAME_SIZE_TARGET,
+                                chunk_sizes)
+            # There shouldn't bee too much small chunks.


Maybe add a comment: the protocol header, the frame headers and the large string headers are written in small chunks.

ogrisel · 2018-01-11T12:57:41Z

Modules/_pickle.c

@@ -2184,8 +2184,7 @@ _Pickler_write_bytes(PicklerObject *self,
        /* Stream write the payload into the file without going through the
           output buffer. */
        if (payload == NULL) {
-            payload = mem = PyMemoryView_FromMemory((char *) data, data_size,
-                                                    PyBUF_READ);
+            payload = mem = PyBytes_FromStringAndSize(data, data_size);


This forces a memory copy. Wouldn't it be possible to create a read-only memoryview that keeps a reference to the original ascii str object to avoid the large memory overhead?

This is not easy. Currently there is no API for creating a memoryview to the area of memory with linked Python object. Designing such API is a separate issue. There should be other issues on the tracker that needs this API. For example bytes IO seems suffers from this issue.

Here I just try to fix a regression introduced by #4353.

serhiy-storchaka

Thank you for your review @ogrisel.

serhiy-storchaka · 2018-01-11T16:22:20Z

Modules/_pickle.c

@@ -2184,8 +2184,7 @@ _Pickler_write_bytes(PicklerObject *self,
        /* Stream write the payload into the file without going through the
           output buffer. */
        if (payload == NULL) {
-            payload = mem = PyMemoryView_FromMemory((char *) data, data_size,
-                                                    PyBUF_READ);
+            payload = mem = PyBytes_FromStringAndSize(data, data_size);


This is not easy. Currently there is no API for creating a memoryview to the area of memory with linked Python object. Designing such API is a separate issue. There should be other issues on the tracker that needs this API. For example bytes IO seems suffers from this issue.

Here I just try to fix a regression introduced by #4353.

pitrou · 2018-01-11T18:20:07Z

This looks fine to me. Does the previous NEWS file need updating?

ogrisel · 2018-01-12T15:06:20Z

Misc/NEWS.d/3.7.0a4.rst

-The picklers no longer allocate temporary memory when dumping large
-``bytes`` and ``str`` objects into a file object. Instead the data is
-directly streamed into the underlying file object.
+The pickler now uses less memory when serialize large bytes and str


when serializing

bpo-31993: Do not use memoryview when pickle large strings.

bfa6c12

PyMemoryView_FromMemory() created a memoryview referring to the internal data of the string. When the string is destroyed the memoryview become referring to a freed memory.

serhiy-storchaka added type-bug An unexpected behavior, bug, or error skip news labels Jan 11, 2018

serhiy-storchaka requested a review from pitrou January 11, 2018 11:13

the-knights-who-say-ni added the CLA signed label Jan 11, 2018

bedevere-bot added the awaiting merge label Jan 11, 2018

ogrisel reviewed Jan 11, 2018

View reviewed changes

Address review comments.

cfe8774

serhiy-storchaka commented Jan 11, 2018

View reviewed changes

Update the NEWS file.

63a7de5

ogrisel reviewed Jan 12, 2018

View reviewed changes

Update 3.7.0a4.rst

0a0674f

serhiy-storchaka merged commit 5b76bdb into python:master Jan 12, 2018

bedevere-bot removed the awaiting merge label Jan 12, 2018

serhiy-storchaka deleted the pickle-large-str-memoryview branch January 12, 2018 22:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-31993: Do not use memoryview when pickle large strings. #5154

bpo-31993: Do not use memoryview when pickle large strings. #5154

Uh oh!

serhiy-storchaka commented Jan 11, 2018 •

edited by bedevere-bot

Loading

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Jan 11, 2018

Uh oh!

ogrisel Jan 11, 2018

Uh oh!

ogrisel Jan 11, 2018

Uh oh!

ogrisel Jan 11, 2018

Uh oh!

ogrisel Jan 11, 2018

Uh oh!

serhiy-storchaka Jan 11, 2018

Uh oh!

ogrisel Jan 12, 2018

Uh oh!

serhiy-storchaka left a comment

Uh oh!

serhiy-storchaka Jan 11, 2018

Uh oh!

pitrou commented Jan 11, 2018

Uh oh!

ogrisel Jan 12, 2018

Uh oh!

Uh oh!

Uh oh!

bpo-31993: Do not use memoryview when pickle large strings. #5154

bpo-31993: Do not use memoryview when pickle large strings. #5154

Uh oh!

Conversation

serhiy-storchaka commented Jan 11, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou commented Jan 11, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

serhiy-storchaka commented Jan 11, 2018 •

edited by bedevere-bot

Loading