Skip to content

Commit c7f02a9

Browse files
authored
bpo-33671 / shutil.copyfile: use memoryview() with dynamic size on Windows (#7681)
bpo-33671 * use memoryview() with size == file size on Windows, see #7160 (comment) * release intermediate (sliced) memoryview immediately * replace "OSX" occurrences with "macOS" * add some unittests for copyfileobj()
1 parent 936f03e commit c7f02a9

File tree

7 files changed

+153
-64
lines changed

7 files changed

+153
-64
lines changed

Doc/library/shutil.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -407,11 +407,15 @@ efficiently (see :issue:`33671`).
407407
"fast-copy" means that the copying operation occurs within the kernel, avoiding
408408
the use of userspace buffers in Python as in "``outfd.write(infd.read())``".
409409

410-
On OSX `fcopyfile`_ is used to copy the file content (not metadata).
410+
On macOS `fcopyfile`_ is used to copy the file content (not metadata).
411411

412412
On Linux, Solaris and other POSIX platforms where :func:`os.sendfile` supports
413413
copies between 2 regular file descriptors :func:`os.sendfile` is used.
414414

415+
On Windows :func:`shutil.copyfile` uses a bigger default buffer size (1 MiB
416+
instead of 16 KiB) and a :func:`memoryview`-based variant of
417+
:func:`shutil.copyfileobj` is used.
418+
415419
If the fast-copy operation fails and no data was written in the destination
416420
file then shutil will silently fallback on using less efficient
417421
:func:`copyfileobj` function internally.

Doc/whatsnew/3.8.rst

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -95,20 +95,18 @@ Optimizations
9595

9696
* :func:`shutil.copyfile`, :func:`shutil.copy`, :func:`shutil.copy2`,
9797
:func:`shutil.copytree` and :func:`shutil.move` use platform-specific
98-
"fast-copy" syscalls on Linux, OSX and Solaris in order to copy the file more
99-
efficiently.
98+
"fast-copy" syscalls on Linux, macOS and Solaris in order to copy the file
99+
more efficiently.
100100
"fast-copy" means that the copying operation occurs within the kernel,
101101
avoiding the use of userspace buffers in Python as in
102102
"``outfd.write(infd.read())``".
103-
All other platforms not using such technique will rely on a faster
104-
:func:`shutil.copyfile` implementation using :func:`memoryview`,
105-
:class:`bytearray` and
106-
:meth:`BufferedIOBase.readinto() <io.BufferedIOBase.readinto>`.
107-
Finally, :func:`shutil.copyfile` default buffer size on Windows was increased
108-
from 16KB to 1MB.
109-
The speedup for copying a 512MB file within the same partition is about +26%
110-
on Linux, +50% on OSX and +38% on Windows. Also, much less CPU cycles are
111-
consumed.
103+
On Windows :func:`shutil.copyfile` uses a bigger default buffer size (1 MiB
104+
instead of 16 KiB) and a :func:`memoryview`-based variant of
105+
:func:`shutil.copyfileobj` is used.
106+
The speedup for copying a 512 MiB file within the same partition is about
107+
+26% on Linux, +50% on macOS and +40% on Windows. Also, much less CPU cycles
108+
are consumed.
109+
See :ref:`shutil-platform-dependent-efficient-copy-operations` section.
112110
(Contributed by Giampaolo Rodola' in :issue:`25427`.)
113111

114112
* The default protocol in the :mod:`pickle` module is now Protocol 4,
@@ -179,6 +177,14 @@ Changes in the Python API
179177
* The :class:`cProfile.Profile` class can now be used as a context
180178
manager. (Contributed by Scott Sanderson in :issue:`29235`.)
181179

180+
* :func:`shutil.copyfile`, :func:`shutil.copy`, :func:`shutil.copy2`,
181+
:func:`shutil.copytree` and :func:`shutil.move` use platform-specific
182+
"fast-copy" syscalls (see
183+
:ref:`shutil-platform-dependent-efficient-copy-operations` section).
184+
185+
* :func:`shutil.copyfile` default buffer size on Windows was changed from
186+
16 KiB to 1 MiB.
187+
182188
CPython bytecode changes
183189
------------------------
184190

Lib/shutil.py

Lines changed: 39 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -43,15 +43,16 @@
4343
except ImportError:
4444
getgrnam = None
4545

46+
_WINDOWS = os.name == 'nt'
4647
posix = nt = None
4748
if os.name == 'posix':
4849
import posix
49-
elif os.name == 'nt':
50+
elif _WINDOWS:
5051
import nt
5152

52-
COPY_BUFSIZE = 1024 * 1024 if os.name == 'nt' else 16 * 1024
53+
COPY_BUFSIZE = 1024 * 1024 if _WINDOWS else 16 * 1024
5354
_HAS_SENDFILE = posix and hasattr(os, "sendfile")
54-
_HAS_FCOPYFILE = posix and hasattr(posix, "_fcopyfile") # OSX
55+
_HAS_FCOPYFILE = posix and hasattr(posix, "_fcopyfile") # macOS
5556

5657
__all__ = ["copyfileobj", "copyfile", "copymode", "copystat", "copy", "copy2",
5758
"copytree", "move", "rmtree", "Error", "SpecialFileError",
@@ -88,9 +89,9 @@ class _GiveupOnFastCopy(Exception):
8889
file copy when fast-copy functions fail to do so.
8990
"""
9091

91-
def _fastcopy_osx(fsrc, fdst, flags):
92+
def _fastcopy_fcopyfile(fsrc, fdst, flags):
9293
"""Copy a regular file content or metadata by using high-performance
93-
fcopyfile(3) syscall (OSX).
94+
fcopyfile(3) syscall (macOS).
9495
"""
9596
try:
9697
infd = fsrc.fileno()
@@ -168,8 +169,11 @@ def _fastcopy_sendfile(fsrc, fdst):
168169
break # EOF
169170
offset += sent
170171

171-
def _copybinfileobj(fsrc, fdst, length=COPY_BUFSIZE):
172-
"""Copy 2 regular file objects open in binary mode."""
172+
def _copyfileobj_readinto(fsrc, fdst, length=COPY_BUFSIZE):
173+
"""readinto()/memoryview() based variant of copyfileobj().
174+
*fsrc* must support readinto() method and both files must be
175+
open in binary mode.
176+
"""
173177
# Localize variable access to minimize overhead.
174178
fsrc_readinto = fsrc.readinto
175179
fdst_write = fdst.write
@@ -179,28 +183,21 @@ def _copybinfileobj(fsrc, fdst, length=COPY_BUFSIZE):
179183
if not n:
180184
break
181185
elif n < length:
182-
fdst_write(mv[:n])
186+
with mv[:n] as smv:
187+
fdst.write(smv)
183188
else:
184189
fdst_write(mv)
185190

186-
def _is_binary_files_pair(fsrc, fdst):
187-
return hasattr(fsrc, 'readinto') and \
188-
isinstance(fsrc, io.BytesIO) or 'b' in getattr(fsrc, 'mode', '') and \
189-
isinstance(fdst, io.BytesIO) or 'b' in getattr(fdst, 'mode', '')
190-
191191
def copyfileobj(fsrc, fdst, length=COPY_BUFSIZE):
192192
"""copy data from file-like object fsrc to file-like object fdst"""
193-
if _is_binary_files_pair(fsrc, fdst):
194-
_copybinfileobj(fsrc, fdst, length=length)
195-
else:
196-
# Localize variable access to minimize overhead.
197-
fsrc_read = fsrc.read
198-
fdst_write = fdst.write
199-
while 1:
200-
buf = fsrc_read(length)
201-
if not buf:
202-
break
203-
fdst_write(buf)
193+
# Localize variable access to minimize overhead.
194+
fsrc_read = fsrc.read
195+
fdst_write = fdst.write
196+
while True:
197+
buf = fsrc_read(length)
198+
if not buf:
199+
break
200+
fdst_write(buf)
204201

205202
def _samefile(src, dst):
206203
# Macintosh, Unix.
@@ -215,7 +212,7 @@ def _samefile(src, dst):
215212
os.path.normcase(os.path.abspath(dst)))
216213

217214
def copyfile(src, dst, *, follow_symlinks=True):
218-
"""Copy data from src to dst.
215+
"""Copy data from src to dst in the most efficient way possible.
219216
220217
If follow_symlinks is not set and src is a symbolic link, a new
221218
symlink will be created instead of copying the file it points to.
@@ -224,7 +221,8 @@ def copyfile(src, dst, *, follow_symlinks=True):
224221
if _samefile(src, dst):
225222
raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
226223

227-
for fn in [src, dst]:
224+
file_size = 0
225+
for i, fn in enumerate([src, dst]):
228226
try:
229227
st = os.stat(fn)
230228
except OSError:
@@ -234,26 +232,34 @@ def copyfile(src, dst, *, follow_symlinks=True):
234232
# XXX What about other special files? (sockets, devices...)
235233
if stat.S_ISFIFO(st.st_mode):
236234
raise SpecialFileError("`%s` is a named pipe" % fn)
235+
if _WINDOWS and i == 0:
236+
file_size = st.st_size
237237

238238
if not follow_symlinks and os.path.islink(src):
239239
os.symlink(os.readlink(src), dst)
240240
else:
241241
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
242-
if _HAS_SENDFILE:
242+
# macOS
243+
if _HAS_FCOPYFILE:
243244
try:
244-
_fastcopy_sendfile(fsrc, fdst)
245+
_fastcopy_fcopyfile(fsrc, fdst, posix._COPYFILE_DATA)
245246
return dst
246247
except _GiveupOnFastCopy:
247248
pass
248-
249-
if _HAS_FCOPYFILE:
249+
# Linux / Solaris
250+
elif _HAS_SENDFILE:
250251
try:
251-
_fastcopy_osx(fsrc, fdst, posix._COPYFILE_DATA)
252+
_fastcopy_sendfile(fsrc, fdst)
252253
return dst
253254
except _GiveupOnFastCopy:
254255
pass
256+
# Windows, see:
257+
# https://github.com/python/cpython/pull/7160#discussion_r195405230
258+
elif _WINDOWS and file_size > 0:
259+
_copyfileobj_readinto(fsrc, fdst, min(file_size, COPY_BUFSIZE))
260+
return dst
255261

256-
_copybinfileobj(fsrc, fdst)
262+
copyfileobj(fsrc, fdst)
257263

258264
return dst
259265

@@ -1147,7 +1153,7 @@ def disk_usage(path):
11471153
used = (st.f_blocks - st.f_bfree) * st.f_frsize
11481154
return _ntuple_diskusage(total, used, free)
11491155

1150-
elif os.name == 'nt':
1156+
elif _WINDOWS:
11511157

11521158
__all__.append('disk_usage')
11531159
_ntuple_diskusage = collections.namedtuple('usage', 'total used free')

Lib/test/test_shutil.py

Lines changed: 81 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
from test.support import TESTFN, FakePath
3434

3535
TESTFN2 = TESTFN + "2"
36-
OSX = sys.platform.startswith("darwin")
36+
MACOS = sys.platform.startswith("darwin")
3737
try:
3838
import grp
3939
import pwd
@@ -1808,7 +1808,7 @@ def _open(filename, mode='r'):
18081808

18091809
self.assertRaises(OSError, shutil.copyfile, 'srcfile', 'destfile')
18101810

1811-
@unittest.skipIf(OSX, "skipped on OSX")
1811+
@unittest.skipIf(MACOS, "skipped on macOS")
18121812
def test_w_dest_open_fails(self):
18131813

18141814
srcfile = self.Faux()
@@ -1828,7 +1828,7 @@ def _open(filename, mode='r'):
18281828
self.assertEqual(srcfile._exited_with[1].args,
18291829
('Cannot open "destfile"',))
18301830

1831-
@unittest.skipIf(OSX, "skipped on OSX")
1831+
@unittest.skipIf(MACOS, "skipped on macOS")
18321832
def test_w_dest_close_fails(self):
18331833

18341834
srcfile = self.Faux()
@@ -1851,7 +1851,7 @@ def _open(filename, mode='r'):
18511851
self.assertEqual(srcfile._exited_with[1].args,
18521852
('Cannot close',))
18531853

1854-
@unittest.skipIf(OSX, "skipped on OSX")
1854+
@unittest.skipIf(MACOS, "skipped on macOS")
18551855
def test_w_source_close_fails(self):
18561856

18571857
srcfile = self.Faux(True)
@@ -1892,6 +1892,80 @@ def test_move_dir_caseinsensitive(self):
18921892
os.rmdir(dst_dir)
18931893

18941894

1895+
class TestCopyFileObj(unittest.TestCase):
1896+
FILESIZE = 2 * 1024 * 1024
1897+
1898+
@classmethod
1899+
def setUpClass(cls):
1900+
write_test_file(TESTFN, cls.FILESIZE)
1901+
1902+
@classmethod
1903+
def tearDownClass(cls):
1904+
support.unlink(TESTFN)
1905+
support.unlink(TESTFN2)
1906+
1907+
def tearDown(self):
1908+
support.unlink(TESTFN2)
1909+
1910+
@contextlib.contextmanager
1911+
def get_files(self):
1912+
with open(TESTFN, "rb") as src:
1913+
with open(TESTFN2, "wb") as dst:
1914+
yield (src, dst)
1915+
1916+
def assert_files_eq(self, src, dst):
1917+
with open(src, 'rb') as fsrc:
1918+
with open(dst, 'rb') as fdst:
1919+
self.assertEqual(fsrc.read(), fdst.read())
1920+
1921+
def test_content(self):
1922+
with self.get_files() as (src, dst):
1923+
shutil.copyfileobj(src, dst)
1924+
self.assert_files_eq(TESTFN, TESTFN2)
1925+
1926+
def test_file_not_closed(self):
1927+
with self.get_files() as (src, dst):
1928+
shutil.copyfileobj(src, dst)
1929+
assert not src.closed
1930+
assert not dst.closed
1931+
1932+
def test_file_offset(self):
1933+
with self.get_files() as (src, dst):
1934+
shutil.copyfileobj(src, dst)
1935+
self.assertEqual(src.tell(), self.FILESIZE)
1936+
self.assertEqual(dst.tell(), self.FILESIZE)
1937+
1938+
@unittest.skipIf(os.name != 'nt', "Windows only")
1939+
def test_win_impl(self):
1940+
# Make sure alternate Windows implementation is called.
1941+
with unittest.mock.patch("shutil._copyfileobj_readinto") as m:
1942+
shutil.copyfile(TESTFN, TESTFN2)
1943+
assert m.called
1944+
1945+
# File size is 2 MiB but max buf size should be 1 MiB.
1946+
self.assertEqual(m.call_args[0][2], 1 * 1024 * 1024)
1947+
1948+
# If file size < 1 MiB memoryview() length must be equal to
1949+
# the actual file size.
1950+
with tempfile.NamedTemporaryFile(delete=False) as f:
1951+
f.write(b'foo')
1952+
fname = f.name
1953+
self.addCleanup(support.unlink, fname)
1954+
with unittest.mock.patch("shutil._copyfileobj_readinto") as m:
1955+
shutil.copyfile(fname, TESTFN2)
1956+
self.assertEqual(m.call_args[0][2], 3)
1957+
1958+
# Empty files should not rely on readinto() variant.
1959+
with tempfile.NamedTemporaryFile(delete=False) as f:
1960+
pass
1961+
fname = f.name
1962+
self.addCleanup(support.unlink, fname)
1963+
with unittest.mock.patch("shutil._copyfileobj_readinto") as m:
1964+
shutil.copyfile(fname, TESTFN2)
1965+
assert not m.called
1966+
self.assert_files_eq(fname, TESTFN2)
1967+
1968+
18951969
class _ZeroCopyFileTest(object):
18961970
"""Tests common to all zero-copy APIs."""
18971971
FILESIZE = (10 * 1024 * 1024) # 10 MiB
@@ -2111,12 +2185,12 @@ def test_file2file_not_supported(self):
21112185
shutil._HAS_SENDFILE = True
21122186

21132187

2114-
@unittest.skipIf(not OSX, 'OSX only')
2115-
class TestZeroCopyOSX(_ZeroCopyFileTest, unittest.TestCase):
2188+
@unittest.skipIf(not MACOS, 'macOS only')
2189+
class TestZeroCopyMACOS(_ZeroCopyFileTest, unittest.TestCase):
21162190
PATCHPOINT = "posix._fcopyfile"
21172191

21182192
def zerocopy_fun(self, src, dst):
2119-
return shutil._fastcopy_osx(src, dst, posix._COPYFILE_DATA)
2193+
return shutil._fastcopy_fcopyfile(src, dst, posix._COPYFILE_DATA)
21202194

21212195

21222196
class TermsizeTests(unittest.TestCase):
Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
11
:func:`shutil.copyfile`, :func:`shutil.copy`, :func:`shutil.copy2`,
22
:func:`shutil.copytree` and :func:`shutil.move` use platform-specific
3-
fast-copy syscalls on Linux, Solaris and OSX in order to copy the file
4-
more efficiently. All other platforms not using such technique will rely on a
5-
faster :func:`shutil.copyfile` implementation using :func:`memoryview`,
6-
:class:`bytearray` and
7-
:meth:`BufferedIOBase.readinto() <io.BufferedIOBase.readinto>`.
8-
Finally, :func:`shutil.copyfile` default buffer size on Windows was increased
9-
from 16KB to 1MB. The speedup for copying a 512MB file is about +26% on Linux,
10-
+50% on OSX and +38% on Windows. Also, much less CPU cycles are consumed
3+
fast-copy syscalls on Linux, Solaris and macOS in order to copy the file
4+
more efficiently.
5+
On Windows :func:`shutil.copyfile` uses a bigger default buffer size (1 MiB
6+
instead of 16 KiB) and a :func:`memoryview`-based variant of
7+
:func:`shutil.copyfileobj` is used.
8+
The speedup for copying a 512MiB file is about +26% on Linux, +50% on macOS and
9+
+40% on Windows. Also, much less CPU cycles are consumed.
1110
(Contributed by Giampaolo Rodola' in :issue:`25427`.)

Modules/clinic/posixmodule.c.h

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Modules/posixmodule.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8774,12 +8774,12 @@ os._fcopyfile
87748774
flags: int
87758775
/
87768776
8777-
Efficiently copy content or metadata of 2 regular file descriptors (OSX).
8777+
Efficiently copy content or metadata of 2 regular file descriptors (macOS).
87788778
[clinic start generated code]*/
87798779

87808780
static PyObject *
87818781
os__fcopyfile_impl(PyObject *module, int infd, int outfd, int flags)
8782-
/*[clinic end generated code: output=8e8885c721ec38e3 input=aeb9456804eec879]*/
8782+
/*[clinic end generated code: output=8e8885c721ec38e3 input=69e0770e600cb44f]*/
87838783
{
87848784
int ret;
87858785

0 commit comments

Comments
 (0)