Skip to content

Rewrite string.Contains/string.StartsWith/string.EndsWith to SqlMethods.Like so DB can reuse query plan #2396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

hazzik
Copy link
Member

@hazzik hazzik commented May 24, 2020

Fixes #725

@hazzik hazzik changed the title Pre-transform like arguments so the db can reuse the query plan Pre-transform arguments of like function so the db can reuse the query plan May 24, 2020
@hazzik hazzik changed the title Pre-transform arguments of like function so the db can reuse the query plan Rewrite string.Contains/string.StartsWith/string.EndsWith to SqlMethods.Like so DB can reuse query plan May 24, 2020
This reverts commit d1ec908.
return Expression.Call(
Like,
expression.Object,
Concat(expression.Arguments[0], Expression.Constant("%"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How escaping works for such queries? I mean do we really properly support queries like Where(x => x.Name.StartsWith("%"))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not supported at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: #484, #488, NH-3726, 9be7d06.
The StartsWith, EndsWith, Contains cases were a comment "side note" in NH-3726 and #484. The main point of NH-3726 was only about SqlMethod.Like. #484 and #488 were PR doing this, and it seems the corresponding fix has ended up committed directly as 9be7d06.
Anyway we lack an opened issue for supporting escaping with these methods, and as said in #484, it will be a possible breaking change.

Copy link
Member Author

@hazzik hazzik May 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we have @fredericDelaporte. #877.

@fredericDelaporte
Copy link
Member

I have a concern about this change here:

The proposed change may in some cases play badly with SQL-Server parameter sniffing, especially when putting in common cache plans for StartsWith with others. SQL-Server can use an index with a StartsWith expression where parameter sniffing would see that the parameter value is startOfString%, so the like will be like N'startOfString%', and then the cached query plan would rely on that index. But that index use will then be useless with a subsequent parameter value of %searchedString, causing the query to perform badly due to re-using a inadequate query plan for that parameter value.

We may have better to keep the StartsWith case specially handled, with a SQL in the form of like @p1 + N'%'. For Contains and EndsWith, having the same query plan should not cause issues with parameter sniffing.

What are your thoughts about it?

@maca88
Copy link
Contributor

maca88 commented Jun 1, 2020

SQL-Server can use an index with a StartsWith expression where parameter sniffing would see that the parameter value is startOfString%

SQL-Server seems to create the same query plan, test case:

CREATE TABLE Location(
Id NVARCHAR(5) NOT NULL,
Value NVARCHAR(5) NOT NULL,
CONSTRAINT [PK_Location] PRIMARY KEY CLUSTERED (Id ASC)
)
CREATE NONCLUSTERED INDEX [IX_Location] ON Location (Value ASC)

DECLARE @cnt INT = 0;

WHILE @cnt < 10000
BEGIN
   INSERT INTO Location(Id, Value) VALUES('A' + CAST(@cnt AS VARCHAR), 'A' + CAST(@cnt AS VARCHAR))
   SET @cnt = @cnt + 1;
END;

DECLARE @id NVARCHAR(5)
DECLARE @id2 NVARCHAR(5)
SET @id = 'A1'
SET @id2 = 'A1%'

SELECT Value FROM Location WHERE Value LIKE 'A1' + '%'
SELECT Value FROM Location WHERE Value LIKE @id + '%'
SELECT Value FROM Location WHERE Value LIKE @id2

The last two queries have the same query plan:
image

On PostgreSQL the same query plan is created for all three queries:

create table location (
id varchar(5) not null constraint location_pkey primary key,
value varchar(5)
);
CREATE INDEX location_value_index ON location(value text_pattern_ops);

INSERT INTO location(id, value)
SELECT 'A' || g, 'A' || g
FROM generate_series(1, 10000) g;

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) SELECT Value FROM location WHERE Value LIKE 'A1' || '%'
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) SELECT Value FROM location WHERE Value LIKE :p1 || '%'
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) SELECT Value FROM location WHERE Value LIKE :p2

If the same applies for other databases then we don't need to handle StartsWith differently.

@fredericDelaporte
Copy link
Member

fredericDelaporte commented Jun 1, 2020

To check about parameter sniffing, you must try with different parameters values, with an index on the "liked" column, with some parameters values starting with % and some other without, and also, you must clear the query cache plan between each test. Read here for an in-depth explanation.

@maca88
Copy link
Contributor

maca88 commented Jun 2, 2020

Read here for an in-depth explanation.

Thanks for the link. I played with different parameters by clearing query cache each time and realized that also %p1% (Contains) and %p1 (EndsWith) may end out having a diffenent query plan, example (Sql Server):

DBCC FREEPROCCACHE;
exec sp_executesql N'SELECT l.Value FROM Location l left join Location l2 on l.Id = l2.Id WHERE l.Value LIKE @id2 AND l2.Id <> @id2', N'@id2 NVARCHAR(5)', @id2='%1%'
DBCC FREEPROCCACHE;
exec sp_executesql N'SELECT l.Value FROM Location l left join Location l2 on l.Id = l2.Id WHERE l.Value LIKE @id2 AND l2.Id <> @id2', N'@id2 NVARCHAR(5)', @id2='1%'
DBCC FREEPROCCACHE;
exec sp_executesql N'SELECT l.Value FROM Location l left join Location l2 on l.Id = l2.Id WHERE l.Value LIKE @id2 AND l2.Id <> @id2', N'@id2 NVARCHAR(5)', @id2='%1'

in the above example all three statements have a different query plan. Now I don't quite understand the following statement from #725:

The where clause in all cases above should be where order0_.[CODE] like @p0', and @p0 = '%001%' / @p0 = '001%' / @p0 = '%001', the one will gain more performance when the query above run many times.

From my current understanding, having one plan for all three statements won't increase the performance, infact it may only reduce it when the database engine would produce different query plans. Currently, we don't have any reported issues regarding producing too many query plans that would cause a performance drop, so maybe we shouldn't reuse the same query plan even for Contains and EndsWith.

@fredericDelaporte
Copy link
Member

Well, recomputing a query plan is slower than using an existing one, if the existing one is suitable. But this gain is in my view marginal against the gain of using a suitable plan.
Also for applications generating a lot of various queries, using more different query plans than needed may create excessive pressure on the query plan cache, if the SQL server is "small". (SQL Express, limited memory available, think about cloud deployment where the ops tries to have the smaller server possible for minimizing the exploitation cost, ... I am not on that side personally currently, the company I currently works for has a "huge over-sizing" policy about servers, but that is not the most popular way of handling things these days.)

@maca88
Copy link
Contributor

maca88 commented Jun 3, 2020

Well, recomputing a query plan is slower than using an existing one, if the existing one is suitable

Correct, I forgot about that. So these are the current differences:

Current master:
[+] Better performance when queries have different query plans
[-] Additional query plans

This PR:
[+] Less query plans, which means less memory usage
[+] Faster first time execution when reusing an existing plan (e.g. query using EndsWith would reuse a plan created by Contains method)
[-] Decrease in performance when queries would have different plans

For me the benefit of having a different query plan for each method outweighs the benefits from this PR, as a bad query plan may cause the query to run multiple times slower. In order to support both scenarios, I made an alternative PR #2411, that adds the ability to configure the pre-transformation by the user.

@hazzik
Copy link
Member Author

hazzik commented Jun 18, 2020

Closing in favor of #2411

@CetinBasoz
Copy link

This discussion seems to be the problem I am having :( We are using an old version of nHibernate where it generates StartsWith as:
@p + '%'

While on SQL server it may seem that both @p+'%' and the one that include '%' in parameter generating the same execution plan, SQL server generates a plan per call !

@fredericDelaporte
Copy link
Member

fredericDelaporte commented Nov 21, 2022

Have you seen this and tried using the new feature introduced by #2411 to solve the issue you encounter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NH-2325 - problems with StartsWith / EndsWith / Contains / Substring / IndexOf
5 participants