Skip to content

Fix for diffing using iterable_compare_func with nested objects. #333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 13, 2022

Conversation

dtorres-sf
Copy link
Contributor

@dtorres-sf dtorres-sf commented Aug 4, 2022

@seperman we have been using deep diff with the iterable_compare_func I helped add a little while back and we ran into an issue with nested objects. If you had an iterable item that moved, and then it had a nested iterable item that got objects added - it was incorrectly merging those objects. So something like:

        t1 = {
            "TestTable": [
                {
                    "id": "022fb580-800e-11ea-a361-39b3dada34b5",
                    "name": "Max",
                    "NestedTable": [
                        {
                            "id": "022fb580-800e-11ea-a361-39b3dada34a6",
                            "NestedField": "Test Field"
                        }
                    ]
                },
                {
                    "id": "022fb580-800e-11ea-a361-12354656532",
                    "name": "Bob",
                    "NestedTable": [
                        {
                            "id": "022fb580-800e-11ea-a361-39b3dada34c7",
                            "NestedField": "Test Field 2"
                        },
                    ]
                },
            ]
        }
        t2 = {"TestTable": [
            {
                "id": "022fb580-800e-11ea-a361-12354656532",
                "name": "Bob (Changed Name)",
                "NestedTable": [
                    {
                        "id": "022fb580-800e-11ea-a361-39b3dada34c7",
                        "NestedField": "Test Field 2 (Changed Nested Field)"
                    },
                    {
                        "id": "new id",
                        "NestedField": "Test Field 3"
                    },
                    {
                        "id": "newer id",
                        "NestedField": "Test Field 4"
                    },
                ]
            },
            {
                "id": "adding_some_random_id",
                "name": "New Name",
                "NestedTable": [
                    {
                        "id": "random_nested_id_added",
                        "NestedField": "New Nested Field"
                    },
                    {
                        "id": "random_nested_id_added2",
                        "NestedField": "New Nested Field2"
                    },
                    {
                        "id": "random_nested_id_added3",
                        "NestedField": "New Nested Field43"
                    },
                ]
            }
        ]}

Merged the items in the t2 nested tables like this rather than splitting it across the item that was moved.

{
  "TestTable": [
    {
      "NestedTable": [
        {
          "id": "random_nested_id_added",
          "NestedField": "New Nested Field"
        },
        {
          "id": "new id",
          "NestedField": "Test Field 3"
        },
        {
          "id": "newer id",
          "NestedField": "Test Field 4"
        }
      ],
      "id": "adding_some_random_id",
      "name": "New Name"
    }
  ]
}

You can see these objects in the new test that was failing without these changes.

I have put comments on the two main changes describing them a bit more.

seperman and others added 5 commits May 12, 2022 23:41
This commit addresses two issues. First ensuring that the diff indexes
for moved items are always relative to t2 (except for removed)
to stay consistent with the rest of the diff types.

Second, when replaying moved items ensure that the new values is
replaced after adding the items. Since the moved items already have any
nested items inside of them, there is no need to replay those nested
added items (it was causing items to get double added).
@@ -709,7 +709,7 @@ def _diff_iterable_in_order(self, level, parents_ids=frozenset(), _original_type
x,
y,
child_relationship_class=child_relationship_class,
child_relationship_param=i)
child_relationship_param=j)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change ensures that all indexes in the resulting diff are relative the t2 which is consistent with items added. If you are ignoring order I don't think this changes anything since i should be equal to j. But if using a compare function and not ignoring order we want the output indexes to be relative to t2 to make them able to be replayed. Only items removed are relative to t1 (since they have to be) and removed items always get replayed first in the delta objects.

@@ -6,11 +6,11 @@
"root['Cars'][3]['production']"
],
"values_changed": {
"root['Cars'][0]['dealers'][1]['quantity']": {
"root['Cars'][2]['dealers'][0]['quantity']": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the existing unittests are because the output is now relative to t2 instead of t1 as mentioned in the above comment.

Comment on lines +267 to +269
# First we need to create a placeholder for moved items.
# This will then get replaced below after we go through added items.
# Without this items can get double added because moved store the new_value and does not need item_added replayed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment pretty much explains why we made this change. But it has to do with the delta modifying data in place. If we didn't make this change, the nested items would get added twice, simply because the "moved" item has the new value that includes any additions. So moved items do not need to be replayed. This just sets the place holders so all the indexes are right when we go into adding items. Then we will actually replace the None with the real value after we replay the added items.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense!

@dtorres-sf
Copy link
Contributor Author

@seperman Just checking in to see if you have a moment to look at this PR.

Unrelated (sort of), appreciate all your work on maintaining - I just signed my org up for monthly donation.

@seperman
Copy link
Owner

seperman commented Aug 13, 2022 via email

@seperman seperman changed the base branch from master to dev August 13, 2022 22:22
Copy link
Owner

@seperman seperman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great catch.

Comment on lines +267 to +269
# First we need to create a placeholder for moved items.
# This will then get replaced below after we go through added items.
# Without this items can get double added because moved store the new_value and does not need item_added replayed
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense!

@seperman seperman merged commit 6881666 into seperman:dev Aug 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants