Skip to content

HHH-15045 avoid dirty-flush for OneToOne type #5061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

NathanQingyangXu
Copy link
Contributor

@NathanQingyangXu NathanQingyangXu commented May 16, 2022

https://hibernate.atlassian.net/browse/HHH-15045

For the following simple OneToOne case:

@Entity
class User {
  @Id int id;

  @OneToOne(mappedBy = "user")
  Profile profile;
}

@Entity
class Profile {
  @Id int id;

  @OneToOne
  User user;
}

Internally the 'user' field in Profile would be treated as @ManyToOne, not @OneToOne. the mappedBy side is treated as the only @OneToOne type. For this reason, OneToOneType is always clean, never dirty.

This is a regression issue after #3590 (for https://hibernate.atlassian.net/browse/HHH-14216). After reverting back the changes in OneToOneType class, the testing case is still fine in that PR.

* We don't need to dirty check one-to-one because of how
* assemble/disassemble is implemented and because a one-to-one
* association is never dirty
*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reverted original comment says all

@NathanQingyangXu NathanQingyangXu force-pushed the HHH-15045 branch 3 times, most recently from 60b538a to 58aad75 Compare May 17, 2022 01:02
return resolve( session.getContextEntityIdentifier( owner ), session, owner );
}
return null;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dreab8 , I saw there were quite some back and forth (see https://hibernate.atlassian.net/browse/HHH-15044); but I think after we revert back the dirty flush related code changes, the above code seems to become necessary again. I am not sure about this, though. Feel free to take over and work on top of my PR or even close it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NathanQingyangXu NathanQingyangXu marked this pull request as draft May 17, 2022 02:16
@deAtog
Copy link

deAtog commented May 17, 2022

Please do not change the behavior here. The one-to-one is dirty whenever it is associated becomes associated because it needs to track when it is associated. The association does not change the way the object is stored in the database, but does change how it is stored in the cache. If you revert this code you you will cause a cache miss and a database hit whenever resolve is called. This negatively impacts performance and is the reason it was changed.

@NathanQingyangXu
Copy link
Contributor Author

NathanQingyangXu commented May 17, 2022 via email

@deAtog
Copy link

deAtog commented May 18, 2022

Looking back at those tests, they do not really seem to test the dirty aspect of the OneToOneType with respect to caching. I think you should implement some other tests to ensure that the behavior is as expected before and after your changes.

Before my changes, disassemble always returned null. Now it returns the id of the associated object or null if there is nothing associated. Likewise, assemble previously always called resolve using the identifier of the owner. The new implementation returns null whenever the assembled identifier is null since the null value guarantees there was no associated object.

The changes in these implementations necessitate a dirty check because adding an associated object to an existing one when using a foreign key relationship should update the cache of the existing object, but not the database.

@NathanQingyangXu
Copy link
Contributor Author

NathanQingyangXu commented May 19, 2022

Looking back at those tests, they do not really seem to test the dirty aspect of the OneToOneType with respect to caching. I think you should implement some other tests to ensure that the behavior is as expected before and after your changes.

Before my changes, disassemble always returned null. Now it returns the id of the associated object or null if there is nothing associated. Likewise, assemble previously always called resolve using the identifier of the owner. The new implementation returns null whenever the assembled identifier is null since the null value guarantees there was no associated object.

The changes in these implementations necessitate a dirty check because adding an associated object to an existing one when using a foreign key relationship should update the cache of the existing object, but not the database.

I am not sure we should change dirtiness to suit the cache purpose. To my feelings, the dirtiness should be retained to db sync up issue only. We should strive to achieve both goals (1. no touch on dirtiness or dirtiness should always be false; 2. make second-level cache working). Seems your previous code changes in assemble() and disassemble() alone have achieved your cache requirements already, without needing to touch the dirtiness related methods. I am not sure whether my understanding is correct or not, thus the DRAFT status of this PR.

@deAtog
Copy link

deAtog commented May 19, 2022

As is right now, if it's marked dirty and there are no database changes the cache will be updated and the database will not. Unfortunately, there is not a way to determine whether or not it is dirty with respect to the cache or respect to the database. If you change the behavior here, performance will undoubtedly suffer whenever you have a local cache and a remote database. The purpose of my changes were to improve the cache performance while having a very minimal impact otherwise. Prior to my changes, whenever assemble was called with fetch="join" and a related object did not exist in the cache, a database query was performed to verify that there was indeed no related object. Given that the purpose of a one-to-one is to store extra data that is typically sparsely populated, these queries were very expensive to execute.

I don't fault you trying to optimize the little bit of overhead incurred here, but I do not think you are going about it the right way. Also, I think there are far bigger fish out there than this.

@NathanQingyangXu
Copy link
Contributor Author

As is right now, if it's marked dirty and there are no database changes the cache will be updated and the database will not. Unfortunately, there is not a way to determine whether or not it is dirty with respect to the cache or respect to the database. If you change the behavior here, performance will undoubtedly suffer whenever you have a local cache and a remote database. The purpose of my changes were to improve the cache performance while having a very minimal impact otherwise. Prior to my changes, whenever assemble was called with fetch="join" and a related object did not exist in the cache, a database query was performed to verify that there was indeed no related object. Given that the purpose of a one-to-one is to store extra data that is typically sparsely populated, these queries were very expensive to execute.

I don't fault you trying to optimize the little bit of overhead incurred here, but I do not think you are going about it the right way. Also, I think there are far bigger fish out there than this.

This is a big fish if you inspect https://hibernate.atlassian.net/browse/HHH-15045. Obviously a regression issue happens and we should fix it. From my understanding, you reuse the well-defined 'dirtiness' methods to go about your cache implementation performance optimization (not for feature for without these changes you testing case is still fine), but it might break a lot for many users are expecting the canonical dirtiness mechanism to go about their processing.
How about this? Let me try to achieve both goals of: 1) maintain 'dirtiness' expectation; 2) keep from degrading your second-level cache performance (as we discussed, there is no concern to break it) by JMH benchmark. I think we can make both sides satisfied.

@deAtog
Copy link

deAtog commented May 20, 2022

I reviewed the above regression. I think we both agree that some mechanism is need to differentiate the whether the one-to-one is dirty from a caching perspective vs a database perspective. Right now the automatic versioning seems to treat the dirtiness of the one-to-one as the only condition required to increment the version, which ultimately is what causes the database update to occur. Without versioning, there isn't an issue. None of the existing tests cover this scenario, hence the regression.

@NathanQingyangXu
Copy link
Contributor Author

NathanQingyangXu commented May 20, 2022 via email

@NathanQingyangXu
Copy link
Contributor Author

NathanQingyangXu commented May 21, 2022

I reviewed the above regression. I think we both agree that some mechanism is need to differentiate the whether the one-to-one is dirty from a caching perspective vs a database perspective. Right now the automatic versioning seems to treat the dirtiness of the one-to-one as the only condition required to increment the version, which ultimately is what causes the database update to occur. Without versioning, there isn't an issue. None of the existing tests cover this scenario, hence the regression.

I debugged your testing case and set breakpoints into isDirty() method and found it always returns false, never returning true. Given that these methods are purely designed for db dirtiness checking, it didn't surprise me (you didn't change the mechanism to use it for cache dirtiness purpose, and I think even if you succeeded, it is nontrivial and extremely difficult). I think your code changes in 'dirtiness' methods are unnecessary (actually it ends up hurting performance for repeated isDirty() method invocation always returning false). I think no JMH benchmark metrics are needed even.

I think we have good reason to revert back your code changes involving dirtiness checking, though yours in assemble() and disassemble() are valid.

I will consider adding a more compelling testing case and then I will label this PR as ready.

@NathanQingyangXu
Copy link
Contributor Author

NathanQingyangXu commented May 21, 2022

I found a good testing case and updated. When the OneToOne owner side update its association, the mappedBy side should not be marked as "dirty" ending up with issuing an unnecessary SQL update statement. The testing case uses @Version to prove that its version got bumped erroneously. If you check the SQL output in console, you could also see the user table was updated erroneously.

Obviously the bug was because of misunderstanding of the dirtiness related methods in OneToOneType in previous PR. Also the erroneous code changes was not justified for the testing case didn't prove we need to touch 'dirtiness' to make it work (which cannot be proved for dirtiness is only meant for db side, not cache side).

@dreab8 @beikov feel free to review this interesting PR.

@NathanQingyangXu NathanQingyangXu marked this pull request as ready for review May 21, 2022 13:39
@deAtog
Copy link

deAtog commented May 23, 2022

As I stated previously, the test cases I added during my initial PR did not test the dirtiness of the OneToOneType. I do not believe the changes you've made to assemble are correct. Calling resolve whenever the id is null will result in a cache miss and database hit in certain cases. If you go back to my initial PR, you'll see that I did the following:

  1. Added failing tests based on existing behavior
  2. Changed the way assemble and disassemble were implemented.
  3. Added implementation for IsDirty.

Prior to adding implementations for IsDirty all my tests failed.

As I stated before, I think more tests are needed here. The one test you've added demonstrates a problem, but you have not included enough tests to verify you are not breaking caching of the OneToOneType with fetch = "join" specified.

@NathanQingyangXu
Copy link
Contributor Author

As I stated previously, the test cases I added during my initial PR did not test the dirtiness of the OneToOneType. I do not believe the changes you've made to assemble are correct. Calling resolve whenever the id is null will result in a cache miss and database hit in certain cases. If you go back to my initial PR, you'll see that I did the following:

  1. Added failing tests based on existing behavior
  2. Changed the way assemble and disassemble were implemented.
  3. Added implementation for IsDirty.

Prior to adding implementations for IsDirty all my tests failed.

As I stated before, I think more tests are needed here. The one test you've added demonstrates a problem, but you have not included enough tests to verify you are not breaking caching of the OneToOneType with fetch = "join" specified.

I think we'd better limit the scope of this PR to dirtiness exclusively, not caching. If you were interested, maybe you can take over. But we'd better separate two issues: 1) cache; 2) dirtiness. The code I added was from @dreab8 to cover the later testing case after yours (with mappedBy scenario your testing case didn't cover). His code was reverted back but the testing case remained and I reverted back his code to cover that testing case exclusively.
I think the commonest usage of @OneToOne is the one on mappedBy side (see hibernate user guide for example), so the issue we talked about in this ticket is more serious. Could we fix this issue first and you can create a cache ticket later?
@dreab8 @beikov I wanna stop here for I think I've exposed the issue successfully and feel free to work on top of it or even close it.

@NathanQingyangXu
Copy link
Contributor Author

NathanQingyangXu commented May 23, 2022

As I stated previously, the test cases I added during my initial PR did not test the dirtiness of the OneToOneType. I do not believe the changes you've made to assemble are correct. Calling resolve whenever the id is null will result in a cache miss and database hit in certain cases. If you go back to my initial PR, you'll see that I did the following:

  1. Added failing tests based on existing behavior
  2. Changed the way assemble and disassemble were implemented.
  3. Added implementation for IsDirty.

Prior to adding implementations for IsDirty all my tests failed.

As I stated before, I think more tests are needed here. The one test you've added demonstrates a problem, but you have not included enough tests to verify you are not breaking caching of the OneToOneType with fetch = "join" specified.

I am confused for I did inspect your commit history carefully. Especially, I found prior to step 3, your testing case has succeeded already. Did you mean your existing testing cases uncommitted? I agree we should cover more testing cases, but given the current testing coverage status, I could only go thus far.
My new code is not related to your testing case (which has been successful due to your step 2) but later regression issue based on mappedBy scenario.

@liangbodeng
Copy link

We need this PR desperately, which reverts the change made to @OnetoOne dirty check in PR #3590. PR #3590 has been blocking our upgrading path from 5.4.x for months. It's nice to see caching improvement attempt, but we should ensure correctness first, which in this case is dirty check. There are other people like me eagerly waiting for it reverted back to old behavior.

@dreab8
Copy link
Member

dreab8 commented May 27, 2022

@NathanQingyangXu Unfortunately this change seems to introduce a regression, I ported a test we have in 5.x but not in 6 and it fails.

@beikov
Copy link
Member

beikov commented May 27, 2022

After some discussion, we came to the conclusion that the fix which was introduced with #3590 for HHH-14216 is invalid, and we will revert it based on what this PR here does. The proper performance improvement for one-to-one associations is IMO to introduce "tombstone" cache entries for the entity type of the one-to-one association, which serve as null markers. This way, we would hit the cache instead of the database and see the tombstone cache entry, which we interpret as null.

@gavinking
Copy link
Member

"tombstone" cache entries

OMG "tombstone" I love it.

@liangbodeng
Copy link

After some discussion, we came to the conclusion that the fix which was introduced with #3590 for HHH-14216 is invalid, and we will revert it based on what this PR here does. The proper performance improvement for one-to-one associations is IMO to introduce "tombstone" cache entries for the entity type of the one-to-one association, which serve as null markers. This way, we would hit the cache instead of the database and see the tombstone cache entry, which we interpret as null.

Really appreciate it! I can't wait to upgrade to the latest version of hibernate in our projects.

@dreab8
Copy link
Member

dreab8 commented Sep 22, 2022

superseded by #5330

@dreab8 dreab8 closed this Sep 22, 2022
@NathanQingyangXu
Copy link
Contributor Author

NathanQingyangXu commented Oct 11, 2022 via email

@deAtog
Copy link

deAtog commented Oct 17, 2022

Maybe, and maybe not. I dug into the code for versioning and right now all one-to-one relationships are versioned. For a one-to-one relationship, there are the following scenarios to consider.

  1. One-to-one association that assumes both the parent and the child have the same primary key values.
  2. One-to-one association that maps a foreign key column.
  3. One-to-one association from an embeddable class to another entity.

In the first case, the relationship from the parent to the child should not be versioned as the child is dependent upon the parent. In the reverse mapping from the child to the parent, the Id of the child should be versioned, but the one-to-one should not.

In the second case, the relationship from the parent to the child should not be versioned as the child is once again dependent upon the parent. In the reverse mapping from the child to the parent the one-to-one should be versioned as the parent Id is stored in a separate column.

In the third case, the relationship from the parent to the child should be versioned as the parent stores the Id of the child in a separate column. In the reverse mapping from the child to the parent, the one-to-one should not be versioned as the child contains no information about the parent.

If the versionability of the one-to-one is corrected then the version will not be incremented and no database update will occur. It is therefore my opinion that this issue can be solved by addressing the versionability of the one-to-one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants