Skip to content

Add ordering to Index implementation #2995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 29, 2021
Merged

Conversation

schmidt-sebastian
Copy link
Contributor

@schmidt-sebastian schmidt-sebastian commented Sep 21, 2021

This PR adds limit() support to our query engine. Unfortunately, this means that we have to enforce ordering as we need to filter the "best n" matches directly in the database layer.

This again changes our indexing format. This test is a good example as to why:

  @Test
  public void testLimitAppliesOrdering() {
    indexManager.addFieldIndex(
        new FieldIndex("coll")
            .withAddedField(field("value"), FieldIndex.Segment.Kind.ORDERED)
            .withAddedField(field("value"), FieldIndex.Segment.Kind.CONTAINS));
    addDoc("coll/doc1", map("value", Arrays.asList(1, "foo")));
    addDoc("coll/doc2", map("value", Arrays.asList(3, "foo")));
    addDoc("coll/doc3", map("value", Arrays.asList(2, "foo")));
    Query query =
        query("coll")
            .filter(filter("value", "array-contains", "foo"))
            .orderBy(orderBy("value"))
            .limitToFirst(2);
    verifyResults(query, "coll/doc1", "coll/doc3");
  }

This filters by the new "array_value" column and then sorts by "directional_value". "array_value" contains one value per array element while "directional_value" contains the full array as is.

@google-oss-bot
Copy link
Contributor

google-oss-bot commented Sep 21, 2021

Coverage Report

Affected SDKs

  • firebase-firestore

    SDK overall coverage changed from 43.15% (d5eec48) to 43.26% (9259321c) by +0.11%.

    Filename Base (d5eec48) Head (9259321c) Diff
    FieldIndex.java 70.00% 76.47% +6.47%
    FirestoreIndexValueWriter.java 43.66% 53.52% +9.86%
    IndexBackfiller.java 50.00% 54.39% +4.39%
    OrderBy.java 96.55% 96.77% +0.22%
    PatchMutation.java 100.00% 98.39% -1.61%
    SQLiteIndexManager.java 95.83% 96.06% +0.23%
    Target.java 96.51% 95.51% -1.01%
    Value.java 42.74% 42.48% -0.26%
    Values.java 94.21% 94.63% +0.41%

Test Logs

Notes

HTML coverage reports can be produced locally with ./gradlew <product>:checkCoverage.
Report files are located at <product-build-dir>/reports/jacoco/.

Head commit (9259321c5) is created by Prow via merging commits: d5eec48 5957bd7.

@google-oss-bot
Copy link
Contributor

google-oss-bot commented Sep 21, 2021

Binary Size Report

Affected SDKs

  • firebase-firestore

    Type Base (d5eec48) Head (9259321c) Diff
    aar 1.20 MB 1.20 MB +616 B (+0.1%)
    apk (release) 3.24 MB 3.24 MB +488 B (+0.0%)

Test Logs

Notes

Head commit (9259321c5) is created by Prow via merging commits: d5eec48 5957bd7.

@schmidt-sebastian schmidt-sebastian changed the title Mrschmidt/limit Add ordering to Index implementation Sep 23, 2021
@schmidt-sebastian
Copy link
Contributor Author

cc @thebrianchen

@schmidt-sebastian
Copy link
Contributor Author

/test smoke-tests

Copy link
Contributor

@wu-hui wu-hui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment in SQLiteSchema.java first, all others are insignificant.

@@ -115,6 +114,27 @@ public boolean hasLimit() {
return endAt;
}

/** Returns the list of values that are used in ARRAY_CONTAINS and ARRAY_CONTAINS_ANY filter. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultranit: _CONTAINS or ARRAY_CONTAINS_ANY filter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -142,26 +162,17 @@ public Bound getLowerBound(FieldIndex fieldIndex) {
case LESS_THAN_OR_EQUAL:
filterValue = Values.getLowerBound(fieldFilter.getValue().getValueTypeCase());
break;
case NOT_EQUAL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT_EQUAL is also deleted here because we do not support it yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can user the default value for this and other cases (see line 131 in the old code, 151 in the new code).

String lowerBoundOp,
@Nullable Object[] upperBounds,
String upperBoundOp) {
int statementCount =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method (and below one) deserve some implementation comments, to explain what it is doing and how. I found them pretty hard to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Comment added.

+ "uid TEXT, " // user id or null if there are no pending mutations
+ "document_name TEXT, "
+ "PRIMARY KEY (index_id, index_value, uid, document_name))");
+ "PRIMARY KEY (index_id, array_value, directional_value, uid, document_name))");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both array_value and directional_value would lead less efficient queries actually:

sqlite> create table index_entries_test ( index_id INTEGER, directional_value TEXT, array_value TEXT, uid TEXT, document_id TEXT, PRIMARY KEY (index_id, array_value, directional_value, uid, document_id));
QUERY PLAN
`--SEARCH TABLE sqlite_master USING INTEGER PRIMARY KEY (rowid=?)

sqlite> select * from index_entries_test where (index_id=1 and directional_value == 'x') order by directional_value limit 2;
QUERY PLAN
`--SEARCH TABLE index_entries_test USING COVERING INDEX sqlite_autoindex_index_entries_test_1 (index_id=?)

sqlite> select * from index_entries_test where (index_id=1 and array_value == 'x') order by directional_value limit 2;
QUERY PLAN
`--SEARCH TABLE index_entries_test USING COVERING INDEX sqlite_autoindex_index_entries_test_1 (index_id=? AND array_value=?)

Note filtering on directional_value without filter array_value leads to full index scan.

If I am not mistaken, both order by and limit are served by our own code currently? So we can do both without this PR right?

If we do want to do this, directional_value should be before array_value as it is more common. For array queries we would then need to create a separate index on index_id, array_value and tell SQLite to use that explicitly.

I am leaning towards the status before this PR though, because other platforms will have to do it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only applying the query limit locally can be really inefficient since we would always read all matching values even if the limit is low. This then imposes ordering, which unfortunately makes it necessary to split up array values from the other field values (as explained in the PR description).

We can actually use the array_value here even if it is not set by simply enforcing that it is null. We already don't use filters that have an array_value for non ArrayContains/ArrayContains any queries.

EXPLAIN QUERY PLAN select * from index_entries_test where (index_id=1 and array_value IS NULL AND directional_value == 'x') order by directional_value limit 2;
QUERY PLAN
`--SEARCH TABLE index_entries_test USING COVERING INDEX sqlite_autoindex_index_entries_test_1 (index_id=? AND array_value=? AND directional_value=?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, always having a filter on array_value should fix this.

Please add a comment to document this in SQLiteIndexManager when the query is built.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, limit is probably still done in memory, by SQLite, but I think we can live with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge this PR as is and add the comment in the follow up PR (the one that fixes IN queries).

@schmidt-sebastian schmidt-sebastian assigned wu-hui and unassigned wu-hui Sep 28, 2021
@google-oss-bot
Copy link
Contributor

@schmidt-sebastian: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
device-check-changed 5957bd7 link /test device-check-changed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wu-hui wu-hui assigned schmidt-sebastian and unassigned wu-hui Sep 28, 2021
@schmidt-sebastian schmidt-sebastian merged commit e8ac306 into master Sep 29, 2021
@schmidt-sebastian schmidt-sebastian deleted the mrschmidt/limit branch September 29, 2021 17:16
@firebase firebase locked and limited conversation to collaborators Oct 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes Override cla size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants