Skip to content

Delay entity insert on Persist until session is flushed #1754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Dec 28, 2018

Conversation

hazzik
Copy link
Member

@hazzik hazzik commented Jun 16, 2018

For centuries people did believe in a myth that identity always triggers immediate insert. I'll prove them wrong.

Here is an explanation of how the persist() should work.

persist() is well defined. It makes a transient instance persistent. However, it doesn't guarantee that the identifier value will be assigned to the persistent instance immediately, the assignment might happen at flush time. The spec doesn't say that, which is the problem I have with persist().

persist() also guarantees that it will not execute an INSERT statement if it is called outside of transaction boundaries. This is useful in long-running conversations with an extended Session/persistence context.

A method like persist() is required.

save() does not guarantee the same, it returns an identifier, and if an INSERT has to be executed to get the identifier (e.g. "identity" generator, not "sequence"), this INSERT happens immediately, no matter if you are inside or outside of a transaction. This is not good in a long-running conversation with an extended Session/persistence context.

In fact, it did work like that before 2.0.0.Alpha1. But here the assumption has been made that shouldDelayIdentityInserts is always false, however, this is not true. If we stick to the commented logic requiresImmediateIdAccess is false on persist. As Persist() method has void return type and so does not require the ID.

This earlier change devalued the Persist() and made it a weird little brother of Save() with an unclear purpose of existence.

This is WIP currently and later it will contain the tests and port of HHH-11019 - delayed id generation if FlushMode is Manual or Commit.

Related to NH-3802.

@hazzik
Copy link
Member Author

hazzik commented Jun 17, 2018

I could not decide if I'm fixing a bug or making an improvement. Duh. /cc @fredericDelaporte

@fredericDelaporte
Copy link
Member

The JSR-220 states about persist:

If X is a new entity, it becomes managed. The entity X will be entered into the database at or
before transaction commit or as a result of the flush operation.

So an insert occurring out of a transaction and without an explicit flush can be considered a bug.

Now, as you have already flagged, this could be a breaking change: maybe some people access the id after a persist even on identity like ids.

I also wonder how well fare cases with dependencies: do we have tests for them? I mean, cases where we persist (and not save) entities referencing each others. Does NHibernate correctly retrieve the id in time for supplying adequate values to the foreign keys?

And what about batching? Is batching enabled thanks to this change? It looks easily doable for cases where the id is retrieved as a result set field. But if it is retrieved as an output parameter, that could be more complicated.

And finally, I wonder why, when we are inside a transaction, the insert should still happen immediately? (Apart from the facts: that it looks conform with the spec, which somehow allow the insert to happen anytime before commit; that it will not be permanent in case of explicit rollback or failure before commit.)
For me the recommended usage pattern of a session is to always use transactions (even very short lived). So the change here will not change anything for the recommended pattern, and I would then downgrade it to minor or even trivial priority.

@hazzik
Copy link
Member Author

hazzik commented Jun 17, 2018

And what about batching? Is batching enabled thanks to this change?

No

It looks easily doable for cases where the id is retrieved as a result set field.

Not so easy.

I also thought that this would enable batching of inserts of entities with identity-like identifiers. Basically we have 2 batching techniques: one which does not require the output – IBatcherFactory, and one which does – MultiQuery. In other words IBatcherFactory for ExecuteNonQuery and IMultiQuery for otherwise.

So, to enable batching for server-side identifier we would need to rewrite the IInsertGeneratedIdentifierDelegate implementors to use IMultiQuery (or rather IFutureValue).

@hazzik
Copy link
Member Author

hazzik commented Jun 17, 2018

The JSR-220 states about persist:

If X is a new entity, it becomes managed. The entity X will be entered into the database at or
before transaction commit or as a result of the flush operation.

So an insert occurring out of a transaction and without an explicit flush can be considered a bug.

It's not a bug because you have written yourself:

Apart from the facts: that it looks conform with the spec, which somehow allow the insert to happen anytime before commit; that it will not be permanent in case of explicit rollback or failure before commit.

So, technically what it says is that insert could happen any time before the Commit or Flush is called.

@fredericDelaporte
Copy link
Member

fredericDelaporte commented Jun 17, 2018

It is a bug, because it is not written it can happen before a flush. What I have written does not state otherwise. It is is written it can happen "at or before a commit". Then the next "or" start a new proposition, especially since it is worded in a way which cannot syntactically well combine as an alternative of the first "or".

So, technically what it says is that insert could happen any time before the Commit or Flush is called.

No. I understand it as saying it can happen at commit, or before commit, or at flush. Understanding it as meaning it can happen also before flush does look weird to me, due to the way that sentence is written.

@hazzik
Copy link
Member Author

hazzik commented Jun 17, 2018

The flush mode is a problem here. As with Auto and Always the flush itself can happen in any time.

@hazzik
Copy link
Member Author

hazzik commented Jun 18, 2018

And finally, I wonder why, when we are inside a transaction, the insert should still happen immediately?

This can be considered as an "optimization". Because transactions are atomic, the changes are fixed in the database at the commit. And it does not really matter when the INSERT is happening. One trivial disadvantage is possible gaps in the IDs if the transaction is rolled back.

I have ported HHH-11019 (sits locally), but I don't really understand why shall we care about flush mode when we are in a transaction. It does not really matter. I would say that it's better to be just this:

bool shouldDelayIdentityInserts = !requiresImmediateIdAccess;

This simple condition would promote the same behaviour regardless of the transactions.

@fredericDelaporte
Copy link
Member

fredericDelaporte commented Jun 18, 2018

The flush mode is a problem here. As with Auto and Always the flush itself can happen in any time.

Yes but these are still flushes, and documented. I should not have written "explicit flush" anywhere indeed, the spec does not handle that point. Moreover the user can avoid these flushes if he wishes to. So I do not see these flush modes as a trouble. They are just a thing users should be aware of, but that is just not always the case.

And finally, I wonder why, when we are inside a transaction, the insert should still happen immediately?

This can be considered as an "optimization".

Yes, that is somewhat what my following parenthesis was attempting to mean.

I have ported HHH-11019 (sits locally), but I don't really understand why shall we care about flush mode when we are in a transaction. It does not really matter. I would say that it's better to be just this:

bool shouldDelayIdentityInserts = !requiresImmediateIdAccess;

This simple condition would promote the same behaviour regardless of the transactions.

Agreed.
And if later, support of batching identity inserts is added, it will be usable in transactions too thanks to that. (By the way there will be a trap with this batching: auto-fk. The current batching logic orders inserts correctly for this case, but it would need also to split the batch, which it does not currently.)

I guess that on Hibernate side they have cared for flush mode in order to minimize the impact of the change. Maybe some kind of overcautious handling of changes. (They are doing the change for correctness of a pattern which can anyway not be correct with auto or always flush modes. So this aim does not require changing the behavior of identity inserts for these two flush modes. But that does not mean it is a good thing to keep that behavior.)

@fredericDelaporte
Copy link
Member

I intend to add the changed proposed by @hazzik:

bool shouldDelayIdentityInserts = !requiresImmediateIdAccess;

And then this PR would no more be WIP, the port of HHH-11019 being dropped as irrelevant to our needs.

@hazzik hazzik changed the title [WIP] Make Persist Great Again Delay entity insert on Persist until session is flushed Oct 30, 2018
@hazzik
Copy link
Member Author

hazzik commented Oct 30, 2018

@fredericDelaporte done

@hazzik
Copy link
Member Author

hazzik commented Oct 31, 2018

So, this does not work. I'm leaning towards dropping the last commit and leave as it was.

@fredericDelaporte
Copy link
Member

Maybe there are more troubles then.

Checking the NHibernate.Test.Operations.MergeFixture.MergeManaged test on master, changed for removing the transaction (s.Flush added instead), it does still succeed.

But it does fail the same way on your branch, even without the last commit, when the transaction is removed.

So that last change is, at least for this test, not the cause of the regression. There is a regression there even without that last change.

@hazzik
Copy link
Member Author

hazzik commented Nov 5, 2018

hibernate/hibernate-orm@333c190

@hazzik hazzik modified the milestones: 5.2, 5.3 Nov 5, 2018
@hazzik
Copy link
Member Author

hazzik commented Nov 5, 2018

@hazzik hazzik changed the title Delay entity insert on Persist until session is flushed WIP - Delay entity insert on Persist until session is flushed Nov 5, 2018
@hazzik
Copy link
Member Author

hazzik commented Dec 26, 2018

@fredericDelaporte I would like to drop AfterAction changes from 4497aff. I think Hibernate's is a better fix.

@fredericDelaporte

This comment has been minimized.

[Obsolete("Please use GetKey() instead.")]
protected internal object Key => GetKey();

protected object GetKey()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this a method enables to generate an async counterpart.

@hazzik
Copy link
Member Author

hazzik commented Dec 26, 2018

After some more considerations, indeed update LoadedKey after flush appears to be more correct approach than Hibernate's one. However, updating only LoadedKey but not CurrentKey seems to be invalid.

Also, regards your previous comment, @fredericDelaporte. The exception is thrown because the code just prior PrepareCollectionForUpdate resolves the CurrentKey and it becomes an actual id, however LoadedKey still points to the delayed key. The LoadedKey indeed should have been resolved as a result of a previous flush.

@hazzik
Copy link
Member Author

hazzik commented Dec 26, 2018

I cannot imagine any situations where this could be required.

@fredericDelaporte can you come up with some test cases?

@fredericDelaporte

This comment has been minimized.

Copy link
Member

@fredericDelaporte fredericDelaporte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases for ownership added. (I have checked, Hibernate fix fails some of them.)

@fabiomaulo
Copy link
Member

fabiomaulo commented Apr 7, 2022

Has reported by @hazzik in his first comment
"persist() also guarantees that it will not execute an INSERT statement if it is called outside of transaction boundaries. This is useful in long-running conversations with an extended Session/persistence context."

What about inside of transaction boundaries?

I saw the bug introduced 15 years ago when we moved, quickly, NH forward to what was the current Hibernate version... and it was a SVN commit (lot of code in one shot) sorry for that, I'm guilty.
The comment NH Different behavior could be a TODO NH Different behavior but now the code is
bool shouldDelayIdentityInserts = !requiresImmediateIdAccess;
that does not take into account the transaction, so we have another bug (bug because some tests and the documentation says that we have to take into account the transaction boundaries).

The original code was:

bool inTxn = source.ConnectionManager.IsInActiveTransaction;
bool shouldDelayIdentityInserts = !inTxn && !requiresImmediateIdAccess;

Please make the FIX taking in account the transaction boundaries.
Thanks.

cc @fredericDelaporte

P.S. https://github.com/hibernate/hibernate-orm/blob/6815c2aa7cf0dd0bea86ff7c55ea9ce8ed7f3969/hibernate-core/src/main/java/org/hibernate/event/internal/AbstractSaveEventListener.java#L243-L244

@hazzik
Copy link
Member Author

hazzik commented Apr 7, 2022

What about inside of transaction boundaries?

The specification only cares about persist() being called outside of transaction boundaries. And it does not have a defined behavior when inside the transaction boundaries. So "we have another bug" is non sequitur:

  1. If A is true, then B is true.
  2. A is false.
  3. Therefore, B is false.

While B can indeed be false, this cannot be linked to the premise since the statement is a non sequitur. This is called denying the antecedent.

A = persist() is called outside of transaction boundaries
B = persist() will not execute an INSERT

  1. "persist() also guarantees that it will not execute an INSERT statement if it is called outside of transaction boundaries"
  2. persist() is called inside transaction boundaries
  3. persist() will execute an INSERT. <- this is non sequitur

Not executing INSERT while inside a transaction IS a valid behavior. In other words persist() can issue INSERT while inside a transaction, but not required to do so.

If you look at commit history and discussion we were going back and forth between the original and current implementation. So, at the end we decided that actions from persist would be executed on flush. Which does complies to the statements above about the persist's behavior and it complies to JSR-220:

If X is a new entity, it becomes managed. The entity X will be entered into the database at or
before transaction commit or as a result of the flush operation.

I'm not against the code which is in Hibernate, however "fixing it" requires a sacrifice on another well defined behavior, the behavior of FlushMode.

@fabiomaulo
Copy link
Member

@hazzik
Persist make the entity persistent which means it will be managed and observed by the session.
Outside transaction the Persist should not perform an INSERT.
Inside the transaction, something like this:

using(var tx = session.BeginTransaction())
{
var e = new MyEntity{ SomeProp = "SomeValue" };
session.Persist(e);
e.SomeProp = "AnotherValue";
tx.Commit();
}

should result in an INSERT and an UPDATE;

Let me show you a succinct more realistic case:

using(var tx = session.BeginTransaction())
{
var e = new MyEntity{ SomeProp = "SomeValue" };
session.Persist(e);
DomainEvents.Raise(new MyEntityCreated{ Subject = e });
tx.Commit();
}

In this case, a bunch of events may change the state of MyEntity before the Commit. The name of the event (MyEntityCreated) is not casual.

The advantage of the usage of Persist is when you work with a bunch of entities just using the session (without begins a transaction). Even the current Hibernate behavior takes into account the transaction state so we continue having a different behavior in NHibernate.
As you saw, Persist and Merge had a different behavior inside a transaction than outside a transaction.

BTW, I would avoid a discussion. I understood that this will be the new behavior from v5.3.x, I know where is the source problem and, inside my current work, the usage of SaveOrUpdate (always inside a transaction) may be the final cut to continue working with my old friend NHibernate.

Thanks to you all for your hard work with NHibernate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants