Skip to content

bpo-33671: efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) #7160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 114 commits into from
Jun 12, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
1a72c01
have shutil.copyfileobj use sendfile() if possible
giampaolo May 22, 2018
77c4bfa
refactoring: use ctx manager
giampaolo May 22, 2018
2afa04a
add test with non-regular file obj
giampaolo May 22, 2018
542cd17
emulate case where file size can't be determined
giampaolo May 22, 2018
3520c6c
reference _copyfileobj_sendfile directly
giampaolo May 22, 2018
050a722
add test for offset() at certain position
giampaolo May 22, 2018
c1fd38a
add test for empty file
giampaolo May 22, 2018
2ab6317
add test for non regular file dst
giampaolo May 22, 2018
dacc3b6
small refactoring
giampaolo May 22, 2018
29d5881
leave copyfileobj() alone in order to not introduce any incompatibility
giampaolo May 24, 2018
114c4de
minor refactoring
giampaolo May 24, 2018
501c0dd
remove old test
giampaolo May 24, 2018
41b4506
update docstring
giampaolo May 24, 2018
fdb0973
update docstring; rename exception class
giampaolo May 24, 2018
64d2bc5
detect platforms which only support file to socket zero copy
giampaolo May 24, 2018
3a3c8ef
don't run test on platforms where file-to-file zero copy is not suppo…
giampaolo May 24, 2018
7861737
use tempfiles
giampaolo May 24, 2018
f3eecfd
reset verbosity
giampaolo May 24, 2018
f67ce57
add test for smaller chunks
giampaolo May 24, 2018
d457254
add big file size test
giampaolo May 24, 2018
8eb211d
add comment
giampaolo May 24, 2018
a0fe703
update doc
giampaolo May 24, 2018
7296147
update whatsnew doc
giampaolo May 24, 2018
d0c3bba
update doc
giampaolo May 24, 2018
2cafd80
catch Exception
giampaolo May 24, 2018
bb2a75f
remove unused import
giampaolo May 24, 2018
e5025dc
add test case for error on second sendfile() call
giampaolo May 24, 2018
a36a534
turn docstring into comment
giampaolo May 24, 2018
e9da3fa
add one more test
giampaolo May 24, 2018
9fcc2e7
update comment
giampaolo May 24, 2018
4f32242
add Misc/NEWS entry
giampaolo May 24, 2018
24ad25a
get rid of COPY_BUFSIZE; it belongs to another PR
giampaolo May 25, 2018
24d20e6
update doc
giampaolo May 25, 2018
7b6e576
expose posix._fcopyfile() for OSX
giampaolo May 27, 2018
b82ddc9
Merge branch 'master' into shutil-osx-copyfile
giampaolo May 27, 2018
b62b61e
merge from linux branch
giampaolo May 27, 2018
34e9618
merge from linux branch
giampaolo May 27, 2018
6b20902
expose fcopyfile
giampaolo May 27, 2018
abf3ecb
arg clinic for the win implementation
giampaolo May 28, 2018
91e492c
convert path type to path_t
giampaolo May 28, 2018
e02c69d
expose CopyFileW
giampaolo May 28, 2018
73837e2
fix windows tests
giampaolo May 28, 2018
28be4c1
release GIL
giampaolo May 28, 2018
6c59adf
minor refactoring
giampaolo May 28, 2018
700629d
update doc
giampaolo May 28, 2018
077912e
update comment
giampaolo May 28, 2018
62c6568
update docstrings
giampaolo May 28, 2018
a40a755
rename functions
giampaolo May 28, 2018
7ba0085
rename test classes
giampaolo May 28, 2018
6c96d97
update doc
giampaolo May 28, 2018
80fbe6e
update doc
giampaolo May 28, 2018
fdf4bcb
update docstrings and comments
giampaolo May 28, 2018
185f130
avoid do import nt|posix modules if unnecessary
giampaolo May 28, 2018
c8c98ae
set nt|posix modules to None if not available
giampaolo May 28, 2018
17bb5e6
micro speedup
giampaolo May 28, 2018
d8b9bf9
update description
giampaolo May 28, 2018
b59ac57
add doc note
giampaolo May 28, 2018
8eefce7
use better wording in doc
giampaolo May 29, 2018
4fc8c6b
Merge branch 'master' into shutil-zero-copy
giampaolo May 30, 2018
3048e3d
rename function using 'fastcopy' prefix instead of 'zerocopy'
giampaolo May 30, 2018
11102e1
use :ref: in rst doc
giampaolo May 30, 2018
7545273
change wording in doc
giampaolo May 30, 2018
3261b74
add test to make sure sendfile() doesn't get called aymore in case it…
giampaolo May 30, 2018
51c476d
move CopyFileW in _winapi and actually expose CopyFileExW instead
giampaolo May 30, 2018
729dd23
fix line endings
giampaolo May 30, 2018
1823828
add tests for mode bits
giampaolo May 30, 2018
a9d6a07
add docstring
giampaolo May 30, 2018
e3ce917
remove test file mode class; let's keep it for later when Istart addr…
giampaolo May 30, 2018
f81a0ec
update doc to reflect new changes
giampaolo May 30, 2018
3e7475b
update doc
giampaolo May 30, 2018
05dd3cf
adjust tests on win
giampaolo May 31, 2018
9b54930
fix argument clinic error
giampaolo May 31, 2018
2bec11c
update doc
giampaolo May 31, 2018
c87648f
OSX: expose copyfile(3) instead of fcopyfile(3); also expose flags ar…
giampaolo May 31, 2018
941f740
osx / copyfile: use path_t instead of char
giampaolo May 31, 2018
4d28c12
do not set dst name in the OSError exception in order to remain consi…
giampaolo May 31, 2018
2149b8b
add same file test
giampaolo May 31, 2018
6a02a2a
add test for same file
giampaolo May 31, 2018
2287508
have osx copyfile() pre-emptively check if src and dst are the same, …
giampaolo May 31, 2018
b9da5d5
turn PermissionError into appropriate SameFileError
giampaolo May 31, 2018
c921f46
expose ERROR_SHARING_VIOLATION in order to raise more appropriate Sam…
giampaolo May 31, 2018
bb24490
honour follow_symlinks arg when using CopyFileEx
giampaolo May 31, 2018
fef8b32
update Misc/NEWS
giampaolo May 31, 2018
71be453
expose CreateDirectoryEx mock
giampaolo Jun 5, 2018
6035fe2
change C type
giampaolo Jun 6, 2018
8dc651e
CreateDirectoryExW actual implementation
giampaolo Jun 6, 2018
5d0eada
provide specific makedirs() implementation for win
giampaolo Jun 6, 2018
d67cdc5
Merge branch 'shutil-zero-copy-8' of https://github.com/giampaolo/cpy…
giampaolo Jun 6, 2018
f65c8ae
fix typo
giampaolo Jun 6, 2018
9c4508e
skeleton for SetNamedSecurityInfo
giampaolo Jun 6, 2018
bb1fee6
get security info for src path
giampaolo Jun 6, 2018
566898a
finally set security attrs
giampaolo Jun 6, 2018
f435053
add unit tests
giampaolo Jun 6, 2018
30c9a57
mimick os.makedirs() behavior and raise if dst dir exists
giampaolo Jun 6, 2018
33f362f
set 2 paths for OSError object
giampaolo Jun 6, 2018
e17e729
set 2 paths for OSError object
giampaolo Jun 6, 2018
bc46f75
expand windows test
giampaolo Jun 6, 2018
cabbc02
in case of exception on os.sendfile() set filename and filename2 exce…
giampaolo Jun 6, 2018
d22ee08
set 2 filenames (src, dst) for OSError in case copyfile() fails on OSX
giampaolo Jun 6, 2018
7a08203
update doc
giampaolo Jun 7, 2018
ab284e9
do not use CreateDirectoryEx() in copytree() if source dir is a symli…
giampaolo Jun 7, 2018
ac9479d
use bytearray() and readinto()
giampaolo Jun 7, 2018
fd77a7e
use memoryview() with bytearray()
giampaolo Jun 7, 2018
42a597e
refactoring + introduce a new _fastcopy_binfileobj() fun
giampaolo Jun 8, 2018
5008a8d
remove CopyFileEx and other C wrappers
giampaolo Jun 8, 2018
e89dd20
remove code related to CopyFileEx
giampaolo Jun 8, 2018
c0dc4b8
Recognize binary files in copyfileobj()
giampaolo Jun 8, 2018
29b9730
set 1MB copy bufsize on win; also add a global _COPY_BUFSIZE variable
giampaolo Jun 8, 2018
a1bed32
use ctx manager for memoryview()
giampaolo Jun 8, 2018
d9d27a7
update doc
giampaolo Jun 9, 2018
17bd78b
remove outdated doc
giampaolo Jun 9, 2018
b1d4917
remove last CopyFileEx remnants
giampaolo Jun 9, 2018
5ce94e4
OSX - use fcopyfile(3) instead of copyfile(3)
giampaolo Jun 12, 2018
07bcef5
update doc
giampaolo Jun 12, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 51 additions & 2 deletions Doc/library/shutil.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,9 @@ Directory and files operations
.. function:: copyfile(src, dst, *, follow_symlinks=True)

Copy the contents (no metadata) of the file named *src* to a file named
*dst* and return *dst*. *src* and *dst* are path names given as strings.
*dst* and return *dst* in the most efficient way possible.
*src* and *dst* are path names given as strings.

*dst* must be the complete target file name; look at :func:`shutil.copy`
for a copy that accepts a target directory path. If *src* and *dst*
specify the same file, :exc:`SameFileError` is raised.
Expand All @@ -74,6 +76,10 @@ Directory and files operations
Raise :exc:`SameFileError` instead of :exc:`Error`. Since the former is
a subclass of the latter, this change is backward compatible.

.. versionchanged:: 3.8
Platform-specific fast-copy syscalls may be used internally in order to
copy the file more efficiently. See
:ref:`shutil-platform-dependent-efficient-copy-operations` section.

.. exception:: SameFileError

Expand Down Expand Up @@ -163,6 +169,11 @@ Directory and files operations
Added *follow_symlinks* argument.
Now returns path to the newly created file.

.. versionchanged:: 3.8
Platform-specific fast-copy syscalls may be used internally in order to
copy the file more efficiently. See
:ref:`shutil-platform-dependent-efficient-copy-operations` section.

.. function:: copy2(src, dst, *, follow_symlinks=True)

Identical to :func:`~shutil.copy` except that :func:`copy2`
Expand All @@ -185,6 +196,11 @@ Directory and files operations
file system attributes too (currently Linux only).
Now returns path to the newly created file.

.. versionchanged:: 3.8
Platform-specific fast-copy syscalls may be used internally in order to
copy the file more efficiently. See
:ref:`shutil-platform-dependent-efficient-copy-operations` section.

.. function:: ignore_patterns(\*patterns)

This factory function creates a function that can be used as a callable for
Expand Down Expand Up @@ -241,6 +257,10 @@ Directory and files operations
Added the *ignore_dangling_symlinks* argument to silent dangling symlinks
errors when *symlinks* is false.

.. versionchanged:: 3.8
Platform-specific fast-copy syscalls may be used internally in order to
copy the file more efficiently. See
:ref:`shutil-platform-dependent-efficient-copy-operations` section.

.. function:: rmtree(path, ignore_errors=False, onerror=None)

Expand Down Expand Up @@ -314,6 +334,11 @@ Directory and files operations
.. versionchanged:: 3.5
Added the *copy_function* keyword argument.

.. versionchanged:: 3.8
Platform-specific fast-copy syscalls may be used internally in order to
copy the file more efficiently. See
:ref:`shutil-platform-dependent-efficient-copy-operations` section.

.. function:: disk_usage(path)

Return disk usage statistics about the given path as a :term:`named tuple`
Expand Down Expand Up @@ -370,6 +395,28 @@ Directory and files operations
operation. For :func:`copytree`, the exception argument is a list of 3-tuples
(*srcname*, *dstname*, *exception*).

.. _shutil-platform-dependent-efficient-copy-operations:

Platform-dependent efficient copy operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Starting from Python 3.8 all functions involving a file copy (:func:`copyfile`,
:func:`copy`, :func:`copy2`, :func:`copytree`, and :func:`move`) may use
platform-specific "fast-copy" syscalls in order to copy the file more
efficiently (see :issue:`33671`).
"fast-copy" means that the copying operation occurs within the kernel, avoiding
the use of userspace buffers in Python as in "``outfd.write(infd.read())``".

On OSX `fcopyfile`_ is used to copy the file content (not metadata).

On Linux, Solaris and other POSIX platforms where :func:`os.sendfile` supports
copies between 2 regular file descriptors :func:`os.sendfile` is used.

If the fast-copy operation fails and no data was written in the destination
file then shutil will silently fallback on using less efficient
:func:`copyfileobj` function internally.

.. versionchanged:: 3.8

.. _shutil-copytree-example:

Expand Down Expand Up @@ -654,6 +701,8 @@ Querying the size of the output terminal

.. versionadded:: 3.3

.. _`fcopyfile`:
http://www.manpagez.com/man/3/copyfile/

.. _`Other Environment Variables`:
http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html#tag_002_003

19 changes: 18 additions & 1 deletion Doc/whatsnew/3.8.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,27 @@ New Modules
Improved Modules
================


Optimizations
=============

* :func:`shutil.copyfile`, :func:`shutil.copy`, :func:`shutil.copy2`,
:func:`shutil.copytree` and :func:`shutil.move` use platform-specific
"fast-copy" syscalls on Linux, OSX and Solaris in order to copy the file more
efficiently.
"fast-copy" means that the copying operation occurs within the kernel,
avoiding the use of userspace buffers in Python as in
"``outfd.write(infd.read())``".
All other platforms not using such technique will rely on a faster
:func:`shutil.copyfile` implementation using :func:`memoryview`,
:class:`bytearray` and
:meth:`BufferedIOBase.readinto() <io.BufferedIOBase.readinto>`.
Finally, :func:`shutil.copyfile` default buffer size on Windows was increased
from 16KB to 1MB.
The speedup for copying a 512MB file within the same partition is about +26%
on Linux, +50% on OSX and +38% on Windows. Also, much less CPU cycles are
consumed.
(Contributed by Giampaolo Rodola' in :issue:`25427`.)

* The default protocol in the :mod:`pickle` module is now Protocol 4,
first introduced in Python 3.4. It offers better performance and smaller
size compared to Protocol 3 available since Python 3.0.
Expand Down
157 changes: 145 additions & 12 deletions Lib/shutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import fnmatch
import collections
import errno
import io

try:
import zlib
Expand Down Expand Up @@ -42,6 +43,16 @@
except ImportError:
getgrnam = None

posix = nt = None
if os.name == 'posix':
import posix
elif os.name == 'nt':
import nt

COPY_BUFSIZE = 1024 * 1024 if os.name == 'nt' else 16 * 1024
_HAS_SENDFILE = posix and hasattr(os, "sendfile")
_HAS_FCOPYFILE = posix and hasattr(posix, "_fcopyfile") # OSX

__all__ = ["copyfileobj", "copyfile", "copymode", "copystat", "copy", "copy2",
"copytree", "move", "rmtree", "Error", "SpecialFileError",
"ExecError", "make_archive", "get_archive_formats",
Expand Down Expand Up @@ -72,14 +83,124 @@ class RegistryError(Exception):
"""Raised when a registry operation with the archiving
and unpacking registries fails"""

class _GiveupOnFastCopy(Exception):
"""Raised as a signal to fallback on using raw read()/write()
file copy when fast-copy functions fail to do so.
"""

def _fastcopy_osx(fsrc, fdst, flags):
"""Copy a regular file content or metadata by using high-performance
fcopyfile(3) syscall (OSX).
"""
try:
infd = fsrc.fileno()
outfd = fdst.fileno()
except Exception as err:
raise _GiveupOnFastCopy(err) # not a regular file

try:
posix._fcopyfile(infd, outfd, flags)
except OSError as err:
err.filename = fsrc.name
err.filename2 = fdst.name
if err.errno in {errno.EINVAL, errno.ENOTSUP}:
raise _GiveupOnFastCopy(err)
else:
raise err from None

def _fastcopy_sendfile(fsrc, fdst):
"""Copy data from one regular mmap-like fd to another by using
high-performance sendfile(2) syscall.
This should work on Linux >= 2.6.33 and Solaris only.
"""
# Note: copyfileobj() is left alone in order to not introduce any
# unexpected breakage. Possible risks by using zero-copy calls
# in copyfileobj() are:
# - fdst cannot be open in "a"(ppend) mode
# - fsrc and fdst may be open in "t"(ext) mode
# - fsrc may be a BufferedReader (which hides unread data in a buffer),
# GzipFile (which decompresses data), HTTPResponse (which decodes
# chunks).
# - possibly others (e.g. encrypted fs/partition?)
global _HAS_SENDFILE
try:
infd = fsrc.fileno()
outfd = fdst.fileno()
except Exception as err:
raise _GiveupOnFastCopy(err) # not a regular file

# Hopefully the whole file will be copied in a single call.
# sendfile() is called in a loop 'till EOF is reached (0 return)
# so a bufsize smaller or bigger than the actual file size
# should not make any difference, also in case the file content
# changes while being copied.
try:
blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MB
except Exception:
blocksize = 2 ** 27 # 128MB

offset = 0
while True:
try:
sent = os.sendfile(outfd, infd, offset, blocksize)
except OSError as err:
# ...in oder to have a more informative exception.
err.filename = fsrc.name
err.filename2 = fdst.name

if err.errno == errno.ENOTSOCK:
# sendfile() on this platform (probably Linux < 2.6.33)
# does not support copies between regular files (only
# sockets).
_HAS_SENDFILE = False
raise _GiveupOnFastCopy(err)

if err.errno == errno.ENOSPC: # filesystem is full
raise err from None

# Give up on first call and if no data was copied.
if offset == 0 and os.lseek(outfd, 0, os.SEEK_CUR) == 0:
raise _GiveupOnFastCopy(err)

raise err
else:
if sent == 0:
break # EOF
offset += sent

def _copybinfileobj(fsrc, fdst, length=COPY_BUFSIZE):
"""Copy 2 regular file objects open in binary mode."""
# Localize variable access to minimize overhead.
fsrc_readinto = fsrc.readinto
fdst_write = fdst.write
with memoryview(bytearray(length)) as mv:
while True:
n = fsrc_readinto(mv)
if not n:
break
elif n < length:
fdst_write(mv[:n])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my test case I used another with block to release this mv[:n] view, rather than depend on implicit deallocation to release it. For example:

>>> b = bytearray(100)
>>> mv1 = memoryview(b)
>>> mv2 = mv1[:10]
>>> mv1.release()
>>> b.append(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: Existing exports of data: object cannot be re-sized
>>> mv2.release()
>>> b.append(0)
>>> len(b)
101

This bytearray is internal, but is there any issue with memory usage in garbage-collected versions of Python (e.g. Jython, IronPython) if the views on the buffer (1 MiB in Windows) aren't released explicitly? If not you can remove the first with block as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm... yes, given the big bufsize I think it makes sense to also immediately release the sliced memoryview.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eryksun If I recall correctly, memory views inadvertently keeping large memory buffers alive on GC based implementations was a key driver in adding context management support to memoryview in the first place, so that's definitely a concern worth keeping in mind for this kind of code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
fdst_write(mv)

def _is_binary_files_pair(fsrc, fdst):
return hasattr(fsrc, 'readinto') and \
isinstance(fsrc, io.BytesIO) or 'b' in getattr(fsrc, 'mode', '') and \
Copy link
Contributor

@eryksun eryksun Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which objects provide readinto in text mode? Is it worth the function call and extra tests rather than handling AttributeError (no readinto) and TypeError (can't write bytes) with an inline try-except in copyfileobj?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think catching TypeError on fdst.write() is too risky as we can deal with any kind of custom file-like object being passed here. It must be noted that the extra cost of this function is payed by users of copyfileobj() only (e.g. tarfile and zipfile modules). copyfile() (and others) will skip this check and call _copybinfileobj() directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I benchmarked _copybinfileobj() (I hadn't yet) and it turns out it's only slightly faster for 512MB files but considerably slower for 8MB and 128KB files so I am gonna remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how you're testing, but the performance difference with readinto depends on whether the source file is already in the system cache. Otherwise, of course memory operations will be dwarfed by considerably slower disk I/O.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I'm testing it:

$ python -c "import os; f = open('f1', 'wb'); f.write(os.urandom(8 * 1024 * 1024))"
$ time ./python -m timeit -s 'import shutil; p1 = "f1"; p2 = "f2"' 'shutil.copyfile(p1, p2)'

Copy link
Contributor Author

@giampaolo giampaolo Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a batch script to figure out timings on Windows more easily and this is the result (first value is original copyfileobj() implementation, second value is the memoryview() variant):

8MB file
1000 loops, best of 5: 343 usec per loop
500 loops, best of 5: 478 usec per loop

64MB file
500 loops, best of 5: 474 usec per loop
500 loops, best of 5: 554 usec per loop

128MB file
200 loops, best of 5: 1.06 msec per loop
500 loops, best of 5: 640 usec per loop

256MB file
1 loop, best of 5: 286 msec per loop
5 loops, best of 5: 36.1 msec per loop

512MB file
1 loop, best of 5: 293 msec per loop
5 loops, best of 5: 36.7 msec per loop

I think the memoryview() variant after 128MB is so much faster that it is worth to have the dual implementation and use it from copyfile() function only if on Windows and file size > 128MB. I will do it in my other PR/branch.

On the other hand, the same test on Linux shows there is no relevant difference for 512 MB files and a performance degradation for smaller ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command is wrong:

./python  -m timeit -s "import shutil; f1 = open('f1', 'rb'); f2 = open('f2', 'wb')" "shutil.copyfileobj(f1, f2)"

The files need to be opened for each pass of the loop, not the setup. That explains the unexpected results. I corrected it to open the files in the loop statement instead of the setup and tested a broad range of file sizes. In the table below all times are best of five for the give number of loops, and normalized overall to make the 64KiB result in the RI_S column equal to 100 time units. I discuss the RI_S case in more detail below.

   SIZE | LOOPS |   R_16K ||    R_1M   %CHNG |  RI_1M   %CHNG |  RI_S   %CHNG
--------+-------+---------||-----------------+----------------+--------------
  1 GiB |     5 | 1060870 || 1003478    -5.4 | 977391    -7.9 | 963478   -9.2
512 MiB |    10 |  323478 ||  283478   -12.4 | 217391   -32.8 | 213913  -33.9
256 MiB |    20 |  163304 ||  146435   -10.3 | 112870   -30.9 | 110957  -32.1
128 MiB |    40 |   80174 ||   74609    -6.9 |  55478   -30.8 |  55478  -30.8
 64 MiB |    80 |   39652 ||   34783   -12.3 |  27304   -31.1 |  27130  -31.6
 32 MiB |   160 |   19478 ||   17913    -8.0 |  13809   -29.1 |  13478  -30.8
 16 MiB |   320 |    9739 ||    8887    -8.7 |   6887   -29.3 |   6835  -29.8
  8 MiB |   640 |    4904 ||    4713    -3.9 |   3652   -25.5 |   3548  -27.7
  4 MiB |  1280 |    2504 ||    2348    -6.2 |   1948   -22.2 |   1913  -23.6
  2 MiB |  2560 |    1150 ||    1193     3.7 |   1002   -12.9 |    991  -13.8
  1 MiB |  5120 |     649 ||     697     7.4 |    659     1.5 |    663    2.2
512 KiB | 10240 |     388 ||     553    42.5 |    499    28.6 |    322  -17.0
256 KiB | 20480 |     245 ||     459    87.3 |    410    67.3 |    205  -16.3
128 KiB | 30720 |     170 ||     388   128.2 |    362   112.9 |    139  -18.2
 64 KiB | 40960 |     123 ||     353   187.0 |    343   178.9 |    100  -18.7

    R_16K -- read 16 KiB
    R_1M  -- read 1 MiB
    RI_1M -- readinto 1 MiB
    RI_S  -- readinto source size up to 1 MiB

Originally I had tested at 128 MiB with a custom test script to focus on the effects of cached vs non-cached I/O. I assumed the results would be similar for other cases. As shown in the RI_1M column, that's basically true for files larger than 1 megabyte. But there's a significant performance degradation for smaller files. In the RI_S case, I address this by calling os.fstat on the source file to cap the length of the bytearray at its size. This avoids wastefully over-allocating a zeroed byterray.

RI_S also experiments with calling SetInformationByHandle : FileEndOfFileInfo (prototyped using ctypes) to avoid having to repeatedly extend the file when length is less than the size of the source file. (Note that Python's truncate method is of no use here since it zeros the file.) CopyFileEx does this, so I figured it was worth a try. This does provide a modest performance increase. (Note that I mistakenly included the length == size boundary, which is apparent in the 1 MiB trial.) I don't know if it's significant enough to justify implementing _winapi.SetFileInformationByHandle.

If I have time, I may run another experiment using mmap to read into a sliding window of the destination file. It already implements setting the end of the file, albeit with the less efficient combination of SetFilePointer and SetEndOfFile. (This way requires 4 system calls instead of 1 to set the end of the file.)

Below is the code for RI_S:

 import os

_WINDOWS = (os.name == 'nt')

if _WINDOWS:
    import msvcrt
    import ctypes
    from ctypes import wintypes
    
    COPY_BUFSIZE = 1024 *1024
    
    kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

    FileEndOfFileInfo = 6

    FILE_INFO_BY_HANDLE_CLASS = wintypes.DWORD
    kernel32.SetFileInformationByHandle.argtypes = (
        wintypes.HANDLE,           # _In_ hFile
        FILE_INFO_BY_HANDLE_CLASS, # _In_ FileInformationClass
        wintypes.LPVOID,           # _In_ lpFileInformation
        wintypes.DWORD)            # _In_ dwBufferSize

def copyfileobj(fsrc, fdst, length=COPY_BUFSIZE):
    size = os.fstat(fsrc.fileno()).st_size
    if length > size:
        length = size
    elif _WINDOWS:
        info = wintypes.LARGE_INTEGER(size)
        if not kernel32.SetFileInformationByHandle(
                    msvcrt.get_osfhandle(fdst.fileno()), FileEndOfFileInfo, 
                    ctypes.byref(info), ctypes.sizeof(info)):
            raise ctypes.WinError(ctypes.get_last_error())
    fsrc_readinto = fsrc.readinto
    fdst_write = fdst.write
    with memoryview(bytearray(length)) as mv:
        while True:
            n = fsrc_readinto(mv)
            if not n:
                break
            elif n < length:
                with mv[:n] as smv:
                    fdst_write(smv)
            else:
                fdst_write(mv)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Thanks for the very detailed benchmark. I updated the other branch which now dynamically sets memoryview() size based on file size (93ebc1f) and I confirm using the readinto() variant is faster also for smaller files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#7681 was merged

isinstance(fdst, io.BytesIO) or 'b' in getattr(fdst, 'mode', '')

def copyfileobj(fsrc, fdst, length=16*1024):
def copyfileobj(fsrc, fdst, length=COPY_BUFSIZE):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
if _is_binary_files_pair(fsrc, fdst):
_copybinfileobj(fsrc, fdst, length=length)
else:
# Localize variable access to minimize overhead.
fsrc_read = fsrc.read
fdst_write = fdst.write
while 1:
buf = fsrc_read(length)
if not buf:
break
fdst_write(buf)

def _samefile(src, dst):
# Macintosh, Unix.
Expand Down Expand Up @@ -117,9 +238,23 @@ def copyfile(src, dst, *, follow_symlinks=True):
if not follow_symlinks and os.path.islink(src):
os.symlink(os.readlink(src), dst)
else:
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
copyfileobj(fsrc, fdst)
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
if _HAS_SENDFILE:
try:
_fastcopy_sendfile(fsrc, fdst)
return dst
except _GiveupOnFastCopy:
pass

if _HAS_FCOPYFILE:
try:
_fastcopy_osx(fsrc, fdst, posix._COPYFILE_DATA)
return dst
except _GiveupOnFastCopy:
pass

_copybinfileobj(fsrc, fdst)

return dst

def copymode(src, dst, *, follow_symlinks=True):
Expand Down Expand Up @@ -244,13 +379,12 @@ def copy(src, dst, *, follow_symlinks=True):

def copy2(src, dst, *, follow_symlinks=True):
"""Copy data and all stat info ("cp -p src dst"). Return the file's
destination."
destination.

The destination may be a directory.

If follow_symlinks is false, symlinks won't be followed. This
resembles GNU's "cp -P src dst".

"""
if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src))
Expand Down Expand Up @@ -1015,7 +1149,6 @@ def disk_usage(path):

elif os.name == 'nt':

import nt
__all__.append('disk_usage')
_ntuple_diskusage = collections.namedtuple('usage', 'total used free')

Expand Down
Loading