Skip to content

insertBatch updateBatch upsertBatch deleteBatch from query #6689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Dec 11, 2022

Conversation

sclubricants
Copy link
Member

@sclubricants sclubricants commented Oct 14, 2022

This PR adds the final piece of some changes to BaseBuilder.

This adds the ability to insert, update, upsert from a query rather than a set of data.

Much of the required pieces of this PR were already in place.

Checklist:

  • Securely signed commits
  • Component(s) with PHPDoc blocks, only if necessary or adds value
  • Unit testing, with >80% coverage
  • User guide updated
  • Conforms to style guide

@sclubricants sclubricants added enhancement PRs that improve existing functionalities database Issues or pull requests that affect the database layer 4.3 labels Oct 14, 2022
@sclubricants
Copy link
Member Author

@codeigniter4/database-team This is the last major piece for now. Please review.

@sclubricants sclubricants force-pushed the insertUpsertUpdateFromQuery branch 2 times, most recently from eb61658 to 9eeefad Compare October 15, 2022 04:01
@sclubricants
Copy link
Member Author

Perhaps it might be better to not pass a query through the $set parameter of th *Batch() methods and rely on using the fromQuery() method only. The $set parameter is really the same as in setData().

@sclubricants sclubricants requested a review from michalsn October 15, 2022 22:08
@sclubricants
Copy link
Member Author

sclubricants commented Oct 16, 2022

It should be noted that this design works on the assumption that the columns in the query are aliased to match those of the target table. While SQL usually works based on position and doesn't require the column names to match it is usually good practice as queries are much more readable. It also makes it much simpler to handle an upsert scenario where the target and source share the same name.. otherwise the developer would have to map them separately.
This also makes it possible to know which columns are used for the query.

@sclubricants sclubricants force-pushed the insertUpsertUpdateFromQuery branch 2 times, most recently from ab4e5eb to 3768583 Compare October 20, 2022 01:03
@kenjis kenjis added the stale Pull requests with conflicts label Oct 20, 2022
@sclubricants sclubricants force-pushed the insertUpsertUpdateFromQuery branch from 3768583 to 0ed47a5 Compare October 20, 2022 23:19
@sclubricants sclubricants removed the stale Pull requests with conflicts label Oct 20, 2022
@sclubricants
Copy link
Member Author

Example:

This is valid SQL:

INSERT INTO pageview (user_id, user_name, last_view)
SELECT id, name, created_date
FROM user
ON DUPLICATE KEY UPDATE
SET user_id = VALUES(id),
user_name = VALUES(name),
last_view = VALUES(created_date)

The problem with this is that the fields would have to be mapped: user_id => id

This PR will assume this is done like:

INSERT INTO pageview (user_id, user_name, last_view)
SELECT id AS user_id, name AS user_name, created_date AS last_view
FROM user
ON DUPLICATE KEY UPDATE
SET user_id = VALUES(user_id ),
user_name = VALUES(user_name ),
last_view = VALUES(last_view )

This way the mapping is done in SQL and allows to easily generate the SQL statements. The code will parse the field names from the select statement and use them to generate the other parts of the statement.

@kenjis
Copy link
Member

kenjis commented Oct 22, 2022

Sorry, we reorganized the changelog v4.3.0.
Please rebase.

@sclubricants sclubricants force-pushed the insertUpsertUpdateFromQuery branch from d5b087d to 71bd85c Compare October 22, 2022 17:31
*
* @param BaseBuilder|RawSql|string $query
*/
public function fromQuery($query): BaseBuilder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this public method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to call the method directly rather than pass it in the *Batch function. I think it is easier to read the code when you do it this way.

$this->db->table('user')->insertBatch($query);

$this->db->table('user')->fromQuery($query)->insertBatch();

$this->db->table('user')->setData($data)->insertBatch();

I also debated weather or not I should accept queries in the $set parameter of the *Batch() methods. $set was really intended as the parameter for setData(). But it works passing query in *Batch() methods.

At some point It might be useful for other methods as well. We could always change it to public later I suppose though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a dumb question, but do we need this new method? I mean.. we have fromSubquery() - shouldn't this be handled by this method somehow? It doesn't look good from the API perspective.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fromSubquery() works a bit differently. It sets the query to QBFrom[0] . fromQuery() sets the query to QBOptions['fromQuery'] and QBKeys to the columns.

One option might be to move the fromQuery() logic to setData(). This kind of makes sense programming wise.. the query is an alternate to a dataset. The only problem might be is that people might try and use it where it is not supported.

Copy link
Member

@kenjis kenjis Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following API seems better if it is possible, because we have fromSubquery() already:

$this->db->table('user')->fromSubquery($query)->insertBatch();

fromSubquery() works a bit differently. It sets the query to QBFrom[0] . fromQuery() sets the query to QBOptions['fromQuery'] and QBKeys to the columns.

It is just internal implementation. We can change the implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setDataAsQuery()

This is what setData() actually does.

fromQuery() was ok but it might get confused with fromSubQuery()

In the future I'd like to have a method that is something like queryAsTable() This would be similar to what fromSubQuery() does in that it puts the query in parenthesis and adds an alias. Basically the results of this function could go anywhere a table name could go. This method could be used instead of fromSubQuery() by just setting table(). It could also go in join().

setQueryAsData() is the only other option I see that might make sense.

One thing to keep in mind is that the API language doesn't have to necessarily follow the SQL. I mean for the most part yes it is nice but with some of these more advanced SQL statements they vary with each DBMS. I liked fromQuery() because it seemed clear that the data was coming from the query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both fromQuery() and setQueryAsData() seem right to me because they describe well what will happen.

The only disadvantage of fromQuery() is that it has a very similar name to an already existing method, and people may be confused about why they're working differently.

For that reason, I would go with setQueryAsData().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that reason, I would go with setQueryAsData().

I agree. I think this clearly presents itself as an alternative to setData().

I changed everything to this.

@kenjis how do you feel about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setQueryAsData() is much better than setData() and setQuery().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally with *Batch you use setData(). setQueryAsData() is a fitting alternative. It is self explanatory.

@kenjis kenjis added 4.3 and removed 4.4 labels Dec 8, 2022
@kenjis kenjis changed the title insert update upsert delete from query insertBatch updateBatch upsertBatch deleteBatch from query Dec 8, 2022
@kenjis
Copy link
Member

kenjis commented Dec 8, 2022

@codeigniter4/database-team
My question is:
Do we really need to accept a query with *batch() methods?
See #6689 (review)

My opinion is:
If a dev want to set a query as data, they should use the setQueryAsData() method.

@kenjis
Copy link
Member

kenjis commented Dec 8, 2022

If I specify two queries as follows, how should QueryBuilder behave?

$builder->setQueryAsData($query1)->onConstraint('email')
->updateFields($additionalUpdateField, true)->upsertBatch($query2)

@michalsn
Copy link
Member

michalsn commented Dec 8, 2022

If I specify two queries as follows, how should QueryBuilder behave?

$builder->setQueryAsData($query1)->onConstraint('email')
->updateFields($additionalUpdateField, true)->upsertBatch($query2)

I think this is pretty straightforward - this would be the equivalent of:

$builder->setQueryAsData($query1)->setQueryAsData($query2)->onConstraint('email')
->updateFields($additionalUpdateField, true)->upsertBatch();

So only $query2 would be used.

The same story is with onConstraint(). It also can be set via upsertBatch() or deleteBatch().

But... from the other side, *Batch() methods would be the only ones that allow this type of data, leading us to inconsistency. From that point of view, I think I agree with @kenjis. BaseBuilder and RawSql as a parameter should be treated as special cases and allowed to be set via a dedicated method only.

@kenjis
Copy link
Member

kenjis commented Dec 8, 2022

It is not bad idea that set*() method will override the existing data, because it is set not add.
But in CI set*() method may add the data. See #6536, #5674
Well, it is not consistent.

@michalsn
Copy link
Member

michalsn commented Dec 8, 2022

Yes, with, let's say array data, set*() will add or override existing keys. And it has sense.

But with a setQueryAsData() or onConstraint(), we can only have one instance, so the value will always be replaced. This might be worth mentioning in the docs.

@sclubricants
Copy link
Member Author

I think that the singularity of the method's name suggests that it can only be used once. Its not setQueriesAsData().

I have removed the BaseBuilder and RawSql inputs from $set parameter in *Batch() methods. I agree that it is a special use case and should not be included. The parameter types should mirror that of setData() and not setQueryAsData().

Copy link
Member

@michalsn michalsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@kenjis kenjis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kenjis kenjis merged commit 8641cde into codeigniter4:4.3 Dec 11, 2022
@MGatner
Copy link
Member

MGatner commented Dec 11, 2022

🥳🎉🎊🕺

@sclubricants sclubricants deleted the insertUpsertUpdateFromQuery branch December 11, 2022 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database Issues or pull requests that affect the database layer enhancement PRs that improve existing functionalities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants