Skip to content

file_get_contents() and file_put_contents() fail with data >=2GB on macOS & BSD #18753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kiler129 opened this issue Jun 3, 2025 · 0 comments

Comments

@kiler129
Copy link

kiler129 commented Jun 3, 2025

The buggy behavior

macOS (arm64)

Running the following code produces an error:

% php -dmemory_limit=-1 -r 'file_get_contents("big");'
PHP Notice:  file_get_contents(): Read of 4694832713 bytes failed with errno=22 Invalid argument in Command line code on line 1

Notice: file_get_contents(): Read of 4694832713 bytes failed with errno=22 Invalid argument in Command line code on line 1

The function on macOS returns a 0-byte string, as verified by gettype(file_get_contents(...)) and strlen(file_get_contents(...). The file is almost 5GB in size:

% php -r 'echo filesize("big") . "\n";'
4694824521

Note the size reported: it appears that macOS tries to read exactly 8,192 bytes past the file size? this is probably not related, see below

Comparing to Linux (x86_64)

On a fully updated Debian 13.0 the result is different:

$ php -dmemory_limit=-1 -r 'echo gettype(file_get_contents("big")) . "\n";'
string

$  php -dmemory_limit=-1 -r 'echo strlen(file_get_contents("big")) . "\n";'
4694824521

PHP Versions

macOS installed via Homebrew:

% php -v
PHP 8.4.7 (cli) (built: May  6 2025 12:31:58) (NTS)
Copyright (c) The PHP Group
Built by Shivam Mathur
Zend Engine v4.4.7, Copyright (c) Zend Technologies
    with Zend OPcache v8.4.7, Copyright (c), by Zend Technologies

Linux:

$ php -v
PHP 8.4.7 (cli) (built: May  9 2025 07:02:39) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.4.7, Copyright (c) Zend Technologies
    with Zend OPcache v8.4.7, Copyright (c), by Zend Technologies

Operating System

% sw_vers
ProductName:		macOS
ProductVersion:		15.5
BuildVersion:		24F74

% uname -m
arm64

Looking for the culprit

How it fails?

While I am not a C developer, nor I have great familiarity with ZE codebase, I tried to take a crack at this. The error seems to be coming from php_stdiop_read():

if (!(stream->flags & PHP_STREAM_FLAG_SUPPRESS_ERRORS)) {
php_error_docref(NULL, E_NOTICE, "Read of %zu bytes failed with errno=%d %s", count, errno, strerror(errno));
}

Initially, I was thinking it's about the 4GB size, or the error reporting a size off by 8K from the real file size, but it doesn't seem to be the case. In fact, any read larger than or equal to 2GB will fail:

php > echo strlen(file_get_contents('big', length: 2 * 1024 * 1024 * 1024 - 1));
2147483647

php > echo strlen(file_get_contents('big', length: 2 * 1024 * 1024 * 1024));
PHP Notice:  file_get_contents(): Read of 2147483648 bytes failed with errno=22 Invalid argument in php shell code on line 1

Notice: file_get_contents(): Read of 2147483648 bytes failed with errno=22 Invalid argument in php shell code on line 1
0

file_get_contents() fails only for regular files, regardless of the underlying filesystem (tested on regular APFS & HFS+ ramdisk):

php > echo strlen(file_get_contents('/dev/zero', length: 5 * 1024 * 1024 * 1024));
5368709120

Issue seems to be isolated to file_get_contents() only. My initial hunch of reads in chunks larger than SSIZE_MAX also led to nowhere, as a single fread() is able to read the file as well:

php > echo strlen(fread(fopen('big', 'r'), filesize('big')));
4694824521

php > var_dump(stream_copy_to_stream(fopen('big','r'), fopen('dst','w')));
int(4694824521)

The issue is also not related to an old bug 69824 of mine with variables >2GB, as on modern PHP versions creating a 5GB (i.e. larger than the file) isn't a problem.
I also couldn't replicate it using PHP code that doesn't use file_get_contents().

Why it fails?

If I'm reading the file_get_contents() implementation for files correctly, it will call _php_stream_copy_to_mem(), which then calls universal _php_stream_read() that calls stream->ops->read() on the stream. I think that call on the stream is set to php_stdiop_read().

I suspected that the read(3) is being called with the full $length, as passed to fgc. This points to behavior of read(3) being different between Darwin and Linux.
I wrote a quick C reproducer and tested:

### macOS
Platform SSIZE_MAX=9223372036854775807
Platform INT_MAX=2147483647
=================================
Trying to get 2147483648 from big
File "big" opened, allocating memory...
Memory allocated, attempting read...
!! read() failed - errno=22 err=Invalid argument
=================================
Trying to get 2147483647 from big
File "big" opened, allocating memory...
Memory allocated, attempting read...
Did read 2147483647 bytes ($req-$actual=0)


### Linux
Platform SSIZE_MAX=9223372036854775807
Platform INT_MAX=2147483647
=================================
Trying to get 2147483648 from big
File "big" opened, allocating memory...
Memory allocated, attempting read...
Did read 2147479552 bytes ($req-$actual=4096)
=================================
Trying to get 2147483647 from big
File "big" opened, allocating memory...
Memory allocated, attempting read...
Did read 2147479552 bytes ($req-$actual=4095)

Linux accepts arbitrary size to read(3) and simply returns maximum amount possible (hmm, 2GB-4K??), which lets the stream logic handle stitching. Darwin/XNU and BSD kernels instead immediately returns EINVAL if requested chunk size is larger than INT_MAX.
The same problem also affects file_put_contents() for the same reasons.

Possible fix?

This behavior appears to be known, as stream_set_chunk_size() errors-out if requested chunk size is > INT_MAX on all platforms. Moreover, while debugging I did a full circle: the php_stdiop_read() does clamp the max chunk/buffer to INT_MAX but only on Windows.
I think adding the clamping for macOS and BSD, in addition to Windows, is the simplest solution - PR provided.

Affected versions

The issue will only appear if the stream read buffer is set > INT_MAX, which in the case of file_get_contents() bisects to commit 6beee1a from #8547 that first landed in PHP 8.2.

Knowing this I found this isn't a problem with just file_get_contents() but also fread() as stream_set_read_buffer() doesn't guard this:

php > $f = fopen("big", "r"); stream_set_read_buffer($f, 0); fread($f, 2147483648);
PHP Notice:  fread(): Read of 2147483648 bytes failed with errno=22 Invalid argument in php shell code on line 1

Notice: fread(): Read of 2147483648 bytes failed with errno=22 Invalid argument in php shell code on line 1

However, I don't think this needs to be guarded even for DX, as this is a user shooting themselves into a foot. After the patch the code above will instead fail with Notice: fread(): Read of 2147483648 bytes failed with errno=9 Bad file descriptor.

Dataset

The exact file I encounter a problem with is available from Cornell University. You can get it directly via curl -L -o ~/Downloads/arxiv.zip https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv. However, after some digging I see it's not about this exact file, i.e. truncate -s 4694824521 big works too.

@kiler129 kiler129 changed the title file_get_contents() fails with files >4GB on macOS file_get_contents() fails with files >=2GB on macOS Jun 4, 2025
kiler129 added a commit to kiler129/php-src that referenced this issue Jun 4, 2025
@kiler129 kiler129 changed the title file_get_contents() fails with files >=2GB on macOS file_get_contents() fails with files >=2GB on macOS & BSD Jun 4, 2025
@kiler129 kiler129 changed the title file_get_contents() fails with files >=2GB on macOS & BSD file_get_contents() and file_put_contents() fail with data >=2GB on macOS & BSD Jun 4, 2025
kiler129 added a commit to kiler129/php-src that referenced this issue Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant