Skip to content

bpo-30103: Allow Uuencode in Python using backtick as zero instead of space #1326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 3, 2017
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions Doc/library/binascii.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,14 @@ The :mod:`binascii` module defines the following functions:
data may be followed by whitespace.


.. function:: b2a_uu(data)
.. function:: b2a_uu(data, \*, backtick=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the backslash is not needed here.


Convert binary data to a line of ASCII characters, the return value is the
converted line, including a newline char. The length of *data* should be at most
45.
45. If *backtick* is true, zeros are represented by backticks instead of spaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show the backtick character explicitly, as

``"`"``

And in all other cases where it is mentioned.


.. versionchanged:: 3.7
Added the *backtick* parameter.


.. function:: a2b_base64(string)
Expand Down
8 changes: 6 additions & 2 deletions Doc/library/uu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,16 @@ This code was contributed by Lance Ellinghouse, and modified by Jack Jansen.
The :mod:`uu` module defines the following functions:


.. function:: encode(in_file, out_file, name=None, mode=None)
.. function:: encode(in_file, out_file, name=None, mode=None, \*, backtick=False)

Uuencode file *in_file* into file *out_file*. The uuencoded file will have
the header specifying *name* and *mode* as the defaults for the results of
decoding the file. The default defaults are taken from *in_file*, or ``'-'``
and ``0o666`` respectively.
and ``0o666`` respectively. If *backtick* is true, zeros are represented by
backticks instead of spaces.

.. versionchanged:: 3.7
Added the *backtick* parameter.


.. function:: decode(in_file, out_file=None, mode=None, quiet=False)
Expand Down
14 changes: 14 additions & 0 deletions Doc/whatsnew/3.7.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,13 @@ New Modules
Improved Modules
================

binascii
--------

The :func:`~binascii.b2a_uu` function now accepts an optional *backtick*
keyword argument. When it's true, zeros are represented by backticks
instead of spaces. (Contributed by Xiang Zhang in :issue:`30103`.)

distutils
---------

Expand Down Expand Up @@ -153,6 +160,13 @@ urllib.parse
adding `~` to the set of characters that is never quoted by default.
(Contributed by Christian Theune and Ratnadeep Debnath in :issue:`16285`.)

uu
--

Function :func:`~uu.encode` now accepts an optional *backtick*
keyword argument. When it's true, zeros are represented by backticks
instead of spaces. (Contributed by Xiang Zhang in :issue:`30103`.)


Optimizations
=============
Expand Down
36 changes: 24 additions & 12 deletions Lib/test/test_binascii.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,29 +112,41 @@ def addnoise(line):

def test_uu(self):
MAX_UU = 45
lines = []
for i in range(0, len(self.data), MAX_UU):
b = self.type2test(self.rawdata[i:i+MAX_UU])
a = binascii.b2a_uu(b)
lines.append(a)
res = bytes()
for line in lines:
a = self.type2test(line)
b = binascii.a2b_uu(a)
res += b
self.assertEqual(res, self.rawdata)
for backtick in (True, False):
lines = []
for i in range(0, len(self.data), MAX_UU):
b = self.type2test(self.rawdata[i:i+MAX_UU])
a = binascii.b2a_uu(b, backtick=backtick)
lines.append(a)
res = bytes()
for line in lines:
a = self.type2test(line)
b = binascii.a2b_uu(a)
res += b
self.assertEqual(res, self.rawdata)

self.assertEqual(binascii.a2b_uu(b"\x7f"), b"\x00"*31)
self.assertEqual(binascii.a2b_uu(b"\x80"), b"\x00"*32)
self.assertEqual(binascii.a2b_uu(b"\xff"), b"\x00"*31)
self.assertRaises(binascii.Error, binascii.a2b_uu, b"\xff\x00")
self.assertRaises(binascii.Error, binascii.a2b_uu, b"!!!!")

self.assertRaises(binascii.Error, binascii.b2a_uu, 46*b"!")

# Issue #7701 (crash on a pydebug build)
self.assertEqual(binascii.b2a_uu(b'x'), b'!> \n')

self.assertEqual(binascii.b2a_uu(b''), b' \n')
self.assertEqual(binascii.b2a_uu(b'', backtick=True), b'`\n')
self.assertEqual(binascii.a2b_uu(b' \n'), b'')
self.assertEqual(binascii.a2b_uu(b'`\n'), b'')
self.assertEqual(binascii.b2a_uu(b'\x00Cat'), b'$ $-A= \n')
self.assertEqual(binascii.b2a_uu(b'\x00Cat', backtick=True),
b'$`$-A=```\n')
self.assertEqual(binascii.a2b_uu(b'$`$-A=```\n'),
binascii.a2b_uu(b'$ $-A= \n'))
with self.assertRaises(TypeError):
binascii.b2a_uu(b"", True)

def test_crc_hqx(self):
crc = binascii.crc_hqx(self.type2test(b"Test the CRC-32 of"), 0)
crc = binascii.crc_hqx(self.type2test(b" this string."), crc)
Expand Down
81 changes: 49 additions & 32 deletions Lib/test/test_uu.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@
import uu
import io

plaintext = b"The smooth-scaled python crept over the sleeping dog\n"
plaintext = b"The symbols on top of your keyboard are !@#$%^&*()_+|~\n"

encodedtext = b"""\
M5&AE('-M;V]T:\"US8V%L960@<'ET:&]N(&-R97!T(&]V97(@=&AE('-L965P
(:6YG(&1O9PH """
M5&AE(\'-Y;6)O;\',@;VX@=&]P(&]F(\'EO=7(@:V5Y8F]A<F0@87)E("% (R0E
Copy link
Member

@vadmium vadmium Apr 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think you need to escape the quotes (\')

*7B8J*"E?*WQ^"@ """

# Stolen from io.py
class FakeIO(io.TextIOWrapper):
Expand Down Expand Up @@ -44,9 +44,14 @@ def getvalue(self):
return self.buffer.getvalue().decode(self._encoding, self._errors)


def encodedtextwrapped(mode, filename):
return (bytes("begin %03o %s\n" % (mode, filename), "ascii") +
encodedtext + b"\n \nend\n")
def encodedtextwrapped(mode, filename, backtick=False):
if backtick:
res = (bytes("begin %03o %s\n" % (mode, filename), "ascii") +
encodedtext.replace(b' ', b'`') + b"\n`\nend\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the only space in encodedtext is the padding space. It would be worth to change examples so that they include inner spaces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I found it when write the test too but didn't change it.

else:
res = (bytes("begin %03o %s\n" % (mode, filename), "ascii") +
encodedtext + b"\n \nend\n")
return res

class UUTest(unittest.TestCase):

Expand All @@ -59,20 +64,27 @@ def test_encode(self):
out = io.BytesIO()
uu.encode(inp, out, "t1", 0o644)
self.assertEqual(out.getvalue(), encodedtextwrapped(0o644, "t1"))
inp = io.BytesIO(plaintext)
out = io.BytesIO()
uu.encode(inp, out, "t1", backtick=True)
self.assertEqual(out.getvalue(), encodedtextwrapped(0o666, "t1", True))
with self.assertRaises(TypeError):
uu.encode(inp, out, "t1", 0o644, True)

def test_decode(self):
inp = io.BytesIO(encodedtextwrapped(0o666, "t1"))
out = io.BytesIO()
uu.decode(inp, out)
self.assertEqual(out.getvalue(), plaintext)
inp = io.BytesIO(
b"UUencoded files may contain many lines,\n" +
b"even some that have 'begin' in them.\n" +
encodedtextwrapped(0o666, "t1")
)
out = io.BytesIO()
uu.decode(inp, out)
self.assertEqual(out.getvalue(), plaintext)
for backtick in True, False:
inp = io.BytesIO(encodedtextwrapped(0o666, "t1", backtick=backtick))
out = io.BytesIO()
uu.decode(inp, out)
self.assertEqual(out.getvalue(), plaintext)
inp = io.BytesIO(
b"UUencoded files may contain many lines,\n" +
b"even some that have 'begin' in them.\n" +
encodedtextwrapped(0o666, "t1", backtick=backtick)
)
out = io.BytesIO()
uu.decode(inp, out)
self.assertEqual(out.getvalue(), plaintext)

def test_truncatedinput(self):
inp = io.BytesIO(b"begin 644 t1\n" + encodedtext)
Expand All @@ -94,25 +106,33 @@ def test_missingbegin(self):

def test_garbage_padding(self):
# Issue #22406
encodedtext = (
encodedtext1 = (
b"begin 644 file\n"
# length 1; bits 001100 111111 111111 111111
b"\x21\x2C\x5F\x5F\x5F\n"
b"\x20\n"
b"end\n"
)
encodedtext2 = (
b"begin 644 file\n"
# length 1; bits 001100 111111 111111 111111
b"\x21\x2C\x5F\x5F\x5F\n"
b"\x60\n"
b"end\n"
)
plaintext = b"\x33" # 00110011

with self.subTest("uu.decode()"):
inp = io.BytesIO(encodedtext)
out = io.BytesIO()
uu.decode(inp, out, quiet=True)
self.assertEqual(out.getvalue(), plaintext)
for encodedtext in encodedtext1, encodedtext2:
with self.subTest("uu.decode()"):
inp = io.BytesIO(encodedtext)
out = io.BytesIO()
uu.decode(inp, out, quiet=True)
self.assertEqual(out.getvalue(), plaintext)

with self.subTest("uu_codec"):
import codecs
decoded = codecs.decode(encodedtext, "uu_codec")
self.assertEqual(decoded, plaintext)
with self.subTest("uu_codec"):
import codecs
decoded = codecs.decode(encodedtext, "uu_codec")
self.assertEqual(decoded, plaintext)

class UUStdIOTest(unittest.TestCase):

Expand Down Expand Up @@ -251,10 +271,7 @@ def test_decodetwice(self):
self._kill(f)

def test_main():
support.run_unittest(UUTest,
UUStdIOTest,
UUFileTest,
)
support.run_unittest(UUTest, UUStdIOTest, UUFileTest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an insignificant change. Wouldn’t it be simple to use “unittest.main” instead?


if __name__=="__main__":
test_main()
13 changes: 8 additions & 5 deletions Lib/uu.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@

"""Implementation of the UUencode and UUdecode functions.

encode(in_file, out_file [,name, mode])
decode(in_file [, out_file, mode])
encode(in_file, out_file [,name, mode], *, backtick=False)
decode(in_file [, out_file, mode, quiet])
"""

import binascii
Expand All @@ -39,7 +39,7 @@
class Error(Exception):
pass

def encode(in_file, out_file, name=None, mode=None):
def encode(in_file, out_file, name=None, mode=None, *, backtick=False):
"""Uuencode file"""
#
# If in_file is a pathname open it and change defaults
Expand Down Expand Up @@ -79,9 +79,12 @@ def encode(in_file, out_file, name=None, mode=None):
out_file.write(('begin %o %s\n' % ((mode & 0o777), name)).encode("ascii"))
data = in_file.read(45)
while len(data) > 0:
out_file.write(binascii.b2a_uu(data))
out_file.write(binascii.b2a_uu(data, backtick=backtick))
data = in_file.read(45)
out_file.write(b' \nend\n')
if backtick:
out_file.write(b'`\nend\n')
else:
out_file.write(b' \nend\n')
finally:
for f in opened_files:
f.close()
Expand Down
3 changes: 3 additions & 0 deletions Misc/NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,9 @@ Extension Modules
Library
-------

- bpo-30103: binascii.b2a_uu() and uu.encode() now support using backtick
as zero instead of space.

- bpo-30101: Add support for curses.A_ITALIC.

- bpo-29822: inspect.isabstract() now works during __init_subclass__. Patch
Expand Down
17 changes: 12 additions & 5 deletions Modules/binascii.c
Original file line number Diff line number Diff line change
Expand Up @@ -334,14 +334,15 @@ binascii_a2b_uu_impl(PyObject *module, Py_buffer *data)
binascii.b2a_uu

data: Py_buffer
/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intended? If so, it needs documenting and testing, but it would be good to avoid if possible. Can’t you put a slash and asterisk after each other to have positional-only and then keyword-only parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AC currently seems doesn't support combining positional-only parameters with keyword-only parameters. So it's not intended but a side effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, I just checked this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, nice.

Copy link
Member Author

@zhangyangyu zhangyangyu Apr 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I support keeping data positional-only, but in bpo-25357 b2a_base64 was changed and as a side effect data is able to accept a keyword argument. The consistency that functions in module binascii all support a positional-only data was broken. What's your mind @serhiy-storchaka and @vadmium ? We keep data positional-only or just make it like b2a_base64 or make b2a_base64 data back?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that change in bpo-25357 was not intentional. Perhaps we can revert it. Open a new issue for this.

*
backtick: bool(accept={int}) = False

Uuencode line of data.
[clinic start generated code]*/

static PyObject *
binascii_b2a_uu_impl(PyObject *module, Py_buffer *data)
/*[clinic end generated code: output=0070670e52e4aa6b input=00fdf458ce8b465b]*/
binascii_b2a_uu_impl(PyObject *module, Py_buffer *data, int backtick)
/*[clinic end generated code: output=b1b99de62d9bbeb8 input=141f61b6ceb56af6]*/
{
unsigned char *ascii_data;
const unsigned char *bin_data;
Expand All @@ -367,7 +368,10 @@ binascii_b2a_uu_impl(PyObject *module, Py_buffer *data)
return NULL;

/* Store the length */
*ascii_data++ = ' ' + (bin_len & 077);
if (backtick && !bin_len)
*ascii_data++ = '`';
else
*ascii_data++ = ' ' + (bin_len & 077);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually & 077 is not needed. bin_len is not larger than 45.


for( ; bin_len > 0 || leftbits != 0 ; bin_len--, bin_data++ ) {
/* Shift the data (or padding) into our buffer */
Expand All @@ -381,7 +385,10 @@ binascii_b2a_uu_impl(PyObject *module, Py_buffer *data)
while ( leftbits >= 6 ) {
this_ch = (leftchar >> (leftbits-6)) & 0x3f;
leftbits -= 6;
*ascii_data++ = this_ch + ' ';
if (backtick && !this_ch)
*ascii_data++ = '`';
else
*ascii_data++ = this_ch + ' ';
}
}
*ascii_data++ = '\n'; /* Append a courtesy newline */
Expand Down
18 changes: 11 additions & 7 deletions Modules/clinic/binascii.c.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,27 +34,31 @@ binascii_a2b_uu(PyObject *module, PyObject *arg)
}

PyDoc_STRVAR(binascii_b2a_uu__doc__,
"b2a_uu($module, data, /)\n"
"b2a_uu($module, /, data, *, backtick=False)\n"
"--\n"
"\n"
"Uuencode line of data.");

#define BINASCII_B2A_UU_METHODDEF \
{"b2a_uu", (PyCFunction)binascii_b2a_uu, METH_O, binascii_b2a_uu__doc__},
{"b2a_uu", (PyCFunction)binascii_b2a_uu, METH_FASTCALL, binascii_b2a_uu__doc__},

static PyObject *
binascii_b2a_uu_impl(PyObject *module, Py_buffer *data);
binascii_b2a_uu_impl(PyObject *module, Py_buffer *data, int backtick);

static PyObject *
binascii_b2a_uu(PyObject *module, PyObject *arg)
binascii_b2a_uu(PyObject *module, PyObject **args, Py_ssize_t nargs, PyObject *kwnames)
{
PyObject *return_value = NULL;
static const char * const _keywords[] = {"data", "backtick", NULL};
static _PyArg_Parser _parser = {"y*|$i:b2a_uu", _keywords, 0};
Py_buffer data = {NULL, NULL};
int backtick = 0;

if (!PyArg_Parse(arg, "y*:b2a_uu", &data)) {
if (!_PyArg_ParseStackAndKeywords(args, nargs, kwnames, &_parser,
&data, &backtick)) {
goto exit;
}
return_value = binascii_b2a_uu_impl(module, &data);
return_value = binascii_b2a_uu_impl(module, &data, backtick);

exit:
/* Cleanup for data */
Expand Down Expand Up @@ -558,4 +562,4 @@ binascii_b2a_qp(PyObject *module, PyObject **args, Py_ssize_t nargs, PyObject *k

return return_value;
}
/*[clinic end generated code: output=4a418f883ccc79fe input=a9049054013a1b77]*/
/*[clinic end generated code: output=25820051c57501c7 input=a9049054013a1b77]*/