Skip to content

Commit 8a65860

Browse files
jurrianpre-commit-ci[bot]matthewhegarty
authored
Skip empty rows in XLSX (#2028)
* Prevent empty lines in XLSX * Test for create_dataset empty rows * Update AUTHORS * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix * Add IMPORT_EXPORT_IMPORT_IGNORE_BLANK_LINES flag * Add docs * Update tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated docs * updated changelog * Update changelog.rst * performance improvement --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: matthewhegarty <[email protected]> Co-authored-by: Matt Hegarty <[email protected]>
1 parent ffe94d1 commit 8a65860

File tree

6 files changed

+96
-4
lines changed

6 files changed

+96
-4
lines changed

AUTHORS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,3 +158,4 @@ The following is a list of much appreciated contributors:
158158
* 19greg96 (Gergely Karz)
159159
* AyushDharDubey
160160
* dahvo (David Mark Awad)
161+
* jurrian

docs/changelog.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@ Changelog
55

66
If upgrading from v3, v4 introduces breaking changes. Please refer to :doc:`release notes<release_notes>`.
77

8+
4.3.6 (unreleased)
9+
------------------
10+
11+
- Add flag to ignore empty rows in XLSX import (`2028 <https://github.com/django-import-export/django-import-export/issues/2028>`_)
12+
813
4.3.5 (2025-02-01)
914
------------------
1015

docs/faq.rst

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -184,12 +184,21 @@ How to create relation during import if it does not exist
184184

185185
See :ref:`creating-non-existent-relations`.
186186

187-
How to handle large file uploads
188-
---------------------------------
187+
How to handle large file imports
188+
--------------------------------
189189

190190
If uploading large files, you may encounter time-outs.
191191
See :ref:`Using celery<celery>` and :ref:`bulk_import:Bulk imports`.
192192

193+
Performance issues or unexpected behavior during import
194+
-------------------------------------------------------
195+
196+
This could be due to hidden rows in Excel files.
197+
Hidden rows can be excluded using :ref:`import_export_import_ignore_blank_lines`.
198+
199+
Refer to `this PR <https://github.com/django-import-export/django-import-export/pull/2028>`_ for more information.
200+
201+
193202
How to use field other than `id` in Foreign Key lookup
194203
------------------------------------------------------
195204

docs/installation.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,16 @@ The values must be those provided in ``import_export.formats.base_formats`` e.g
254254
255255
This can be set for a specific model admin by declaring the ``export_formats`` attribute.
256256

257+
.. _import_export_import_ignore_blank_lines:
258+
259+
``IMPORT_EXPORT_IMPORT_IGNORE_BLANK_LINES``
260+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
261+
262+
If set to ``True``, rows without content will be ignored in XSLX imports.
263+
This prevents an old Excel 1.0 bug which causes openpyxl ``max_rows`` to be counting all
264+
logical empty rows. Some editors (like LibreOffice) might add :math:`2^{20}` empty rows to the
265+
file, which causes a significant slowdown. By default this is ``False``.
266+
257267
.. _exampleapp:
258268

259269
Example app

import_export/formats/base_formats.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,9 +208,18 @@ def create_dataset(self, in_stream):
208208
rows = sheet.rows
209209
dataset.headers = [cell.value for cell in next(rows)]
210210

211+
ignore_blanks = getattr(
212+
settings, "IMPORT_EXPORT_IMPORT_IGNORE_BLANK_LINES", False
213+
)
211214
for row in rows:
212215
row_values = [cell.value for cell in row]
213-
dataset.append(row_values)
216+
217+
if ignore_blanks:
218+
# do not add empty rows to dataset
219+
if not all(value is None for value in row_values):
220+
dataset.append(row_values)
221+
else:
222+
dataset.append(row_values)
214223
return dataset
215224

216225
def export_data(self, dataset, **kwargs):

tests/core/tests/test_base_formats.py

Lines changed: 59 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
import os
22
import unittest
3+
from io import BytesIO
34
from unittest import mock
45

6+
import openpyxl
57
import tablib
68
from core.tests.utils import ignore_utcnow_deprecation_warning
7-
from django.test import TestCase
9+
from django.test import TestCase, override_settings
810
from django.utils.encoding import force_str
911
from tablib.core import UnsupportedFormat
1012

@@ -115,6 +117,62 @@ def test_that_load_workbook_called_with_required_args(self, mock_load_workbook):
115117
unittest.mock.ANY, read_only=True, data_only=True
116118
)
117119

120+
@override_settings(IMPORT_EXPORT_IMPORT_IGNORE_BLANK_LINES=False)
121+
def test_xlsx_create_dataset__empty_rows(self):
122+
"""Default situation without the flag: do not ignore the empty rows for
123+
backwards compatibility.
124+
"""
125+
rows_before = 3
126+
empty_rows = 5
127+
rows_after = 2
128+
129+
wb = openpyxl.Workbook()
130+
ws = wb.active
131+
ws.append(["Header1", "Header2", "Header3"])
132+
133+
for _ in range(rows_before):
134+
ws.append(["Data1", "Data2", "Data3"])
135+
136+
for _ in range(empty_rows):
137+
ws.append([None, None, None])
138+
139+
for _ in range(rows_after):
140+
ws.append(["Data1", "Data2", "Data3"])
141+
142+
xlsx_data = BytesIO()
143+
wb.save(xlsx_data)
144+
xlsx_data.seek(0)
145+
146+
dataset = self.format.create_dataset(xlsx_data.getvalue())
147+
assert len(dataset) == rows_before + empty_rows + rows_after # With empty rows
148+
149+
@override_settings(IMPORT_EXPORT_IMPORT_IGNORE_BLANK_LINES=True)
150+
def test_xlsx_create_dataset__ignore_empty_rows(self):
151+
"""Ensure that empty rows are not added to the dataset."""
152+
rows_before = 3
153+
empty_rows = 5
154+
rows_after = 2
155+
156+
wb = openpyxl.Workbook()
157+
ws = wb.active
158+
ws.append(["Header1", "Header2", "Header3"])
159+
160+
for _ in range(rows_before):
161+
ws.append(["Data1", "Data2", "Data3"])
162+
163+
for _ in range(empty_rows):
164+
ws.append([None, None, None])
165+
166+
for _ in range(rows_after):
167+
ws.append(["Data1", "Data2", "Data3"])
168+
169+
xlsx_data = BytesIO()
170+
wb.save(xlsx_data)
171+
xlsx_data.seek(0)
172+
173+
dataset = self.format.create_dataset(xlsx_data.getvalue())
174+
assert len(dataset) == rows_before + rows_after # Without empty rows
175+
118176

119177
class CSVTest(TestCase):
120178
def setUp(self):

0 commit comments

Comments
 (0)