Skip to content

Fixes bug #69181 (csv parsing drops newlines within fields with SplFileObject::DROP_NEW_LINE) #1148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

g105b
Copy link

@g105b g105b commented Mar 5, 2015

This pull request fixes buggy behaviour when using SplFileObject to work with CSV files.

According to the documentation, the DROP_NEW_LINE flag drops newlines at the end of a line. http://php.net/manual/en/class.splfileobject.php#splfileobject.constants.drop-new-line

However, even though CSV files are obviously stored with new lines at the end of each line, the fields within the CSV can also contain any number of newline characters, as long as they are quoted. In this case, the bug appears, as the DROP_NEW_LINE flag causes c function strcspn to remove the first, and only the first, newline from the entire row of fields.

My solution is to ignore the strcspn function when parsing CSV files, because if there are any newlines within fields, the fields MUST be delimited, therefore no newlines should be dropped within the delimiters.

Greg Bowler added 3 commits March 5, 2015 12:11
Running this test fails because when DROP_NEW_LINES flag is present when
reading CSV files, fields containing quoted new lines are not anticipated.
Fixes bug #69181 (csv parsing drops newlines within fields with
SplFileObject::DROP_NEW_LINE)

when SPL_FILE_OBJECT_READ_CSV is set, a newline does not
necessarily mean the end of the row, so don't drop. CSV rows
can contain multiple quoted newlines.
@g105b
Copy link
Author

g105b commented Mar 5, 2015

The test that fails in the Travis build is ext/phar/tests/phar_oo_009.phpt.

This test is titled "Phar object: iterating via SplFileObject and reading csv", so it seems to me that on line 19 the SplFileObject::READ_CSV flag should be set, as on line 29?

My PR ignores dropping new lines within CSV rows (keeping the original data, as it is stored in CSV), so the outcome is that when SplFileObject::READ_CSV is set, no new lines are dropped (affecting the expected result from the now-failing test).

@laruence and @m6w6 may want to give input to this, regarding the Phar test?

@laruence laruence added the Bug label Mar 5, 2015
@ghost
Copy link

ghost commented Mar 7, 2015

Can one of the admins verify this patch?

@smalyshev
Copy link
Contributor

I'm not sure the result you're getting in ext/phar/tests/phar_oo_009.phpt is correct. Linefeed within delimiters may be passed as is, true, but the linefeed you have there is not within the delimiters. I think you need to handle this case more carefully.

@g105b
Copy link
Author

g105b commented Mar 23, 2015

@smalyshev I'm not sure what the phar test is doing, and I have not managed to run it in isolation.

I think you need to handle this case more carefully

are you addressing the functionality supplied by my patch, or the test in ext/phar/tests/phar_oo_009.phpt ?

@cmb69
Copy link
Member

cmb69 commented Aug 6, 2015

The test is failing, because there is a empty line in the CSV, which is not skipped, even though SplFileObject::SKIP_EMPTY is given. That is because the modified code puts a non zero length into intern->u.file.current_line_len, but the line would only be skipped if intern->u.file.current_line_len was zero.

Actually, it seems to me that unrelated whether SPL_FILE_OBJECT_READ_CSV is given, the buffer should be investigated from the end, replacing all \n and \r with \0 (decreasing line_len as well), and stopping as soon as another character is detected.

IOW: don't make a case differentiation, but replace the strcspn() with something that works from right-to-left.

@cmb69
Copy link
Member

cmb69 commented Aug 6, 2015

Diregard my suggested solution; that can't work.

@krakjoe
Copy link
Member

krakjoe commented Jan 4, 2017

Since this targets an unsupported branch of PHP, and since this seems to introduce inconsistencies of it's own, and since the author seems to have abandoned working on it, I'm closing this PR.

Please take this action as encouragement to open a clean PR against a supported branch of PHP.

@krakjoe krakjoe closed this Jan 4, 2017
@g105b
Copy link
Author

g105b commented Jan 4, 2017

Thanks, I'll look into the affects on 7 and whether it is still valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants