Skip to content

Commit 269cb34

Browse files
[3.12] gh-113028: Correctly memoize str in pickle when escapes added (GH-113436) (GH-113448)
This fixes a divergence between the Python and C implementations of pickle for protocol 0, such that it pickle.py fails to re-use the first pickled representation of strings involving characters that have to be escaped. (cherry picked from commit 0839863) Co-authored-by: Jeff Allen <[email protected]>
1 parent 15ea4a4 commit 269cb34

File tree

3 files changed

+21
-7
lines changed

3 files changed

+21
-7
lines changed

Lib/pickle.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -855,13 +855,13 @@ def save_str(self, obj):
855855
else:
856856
self.write(BINUNICODE + pack("<I", n) + encoded)
857857
else:
858-
obj = obj.replace("\\", "\\u005c")
859-
obj = obj.replace("\0", "\\u0000")
860-
obj = obj.replace("\n", "\\u000a")
861-
obj = obj.replace("\r", "\\u000d")
862-
obj = obj.replace("\x1a", "\\u001a") # EOF on DOS
863-
self.write(UNICODE + obj.encode('raw-unicode-escape') +
864-
b'\n')
858+
# Escape what raw-unicode-escape doesn't, but memoize the original.
859+
tmp = obj.replace("\\", "\\u005c")
860+
tmp = tmp.replace("\0", "\\u0000")
861+
tmp = tmp.replace("\n", "\\u000a")
862+
tmp = tmp.replace("\r", "\\u000d")
863+
tmp = tmp.replace("\x1a", "\\u001a") # EOF on DOS
864+
self.write(UNICODE + tmp.encode('raw-unicode-escape') + b'\n')
865865
self.memoize(obj)
866866
dispatch[str] = save_str
867867

Lib/test/pickletester.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1825,6 +1825,14 @@ def test_unicode_high_plane(self):
18251825
t2 = self.loads(p)
18261826
self.assert_is_copy(t, t2)
18271827

1828+
def test_unicode_memoization(self):
1829+
# Repeated str is re-used (even when escapes added).
1830+
for proto in protocols:
1831+
for s in '', 'xyz', 'xyz\n', 'x\\yz', 'x\xa1yz\r':
1832+
p = self.dumps((s, s), proto)
1833+
s1, s2 = self.loads(p)
1834+
self.assertIs(s1, s2)
1835+
18281836
def test_bytes(self):
18291837
for proto in protocols:
18301838
for s in b'', b'xyz', b'xyz'*100:
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
When a second reference to a string appears in the input to :mod:`pickle`,
2+
and the Python implementation is in use,
3+
we are guaranteed that a single copy gets pickled
4+
and a single object is shared when reloaded.
5+
Previously, in protocol 0, when a string contained certain characters
6+
(e.g. newline) it resulted in duplicate objects.

0 commit comments

Comments
 (0)