Skip to content

Use DocumentKey as tie breaker in index backfill #3174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 30, 2021

Conversation

schmidt-sebastian
Copy link
Contributor

This PR changes the index backfill offset to use readTime+documentKey instead of just readTime. This helps there are more than 50 documents per read time.

@google-cla google-cla bot added the cla: yes Override cla label Nov 24, 2021
@google-oss-bot
Copy link
Contributor

google-oss-bot commented Nov 29, 2021

Coverage Report

Affected SDKs

  • firebase-firestore

    SDK overall coverage changed from 45.20% (ab6bab4) to 45.28% (e42ab4ff) by +0.08%.

    Filename Base (ab6bab4) Head (e42ab4ff) Diff
    AutoValue_FieldIndex_IndexOffset.java ? 62.50% ?
    DefaultQueryEngine.java 97.83% 97.87% +0.05%
    IndexBackfiller.java 77.38% 78.05% +0.67%
    PatchMutation.java 98.39% 100.00% +1.61%
    SQLiteIndexManager.java 99.36% 99.37% +0.01%
    SQLiteRemoteDocumentCache.java 95.83% 96.00% +0.17%
    SetMutation.java 94.29% 97.14% +2.86%

Test Logs

Notes

HTML coverage reports can be produced locally with ./gradlew <product>:checkCoverage.
Report files are located at <product-build-dir>/reports/jacoco/.

Head commit (e42ab4ff) is created by Prow via merging commits: ab6bab4 72489fc.

@google-oss-bot
Copy link
Contributor

google-oss-bot commented Nov 29, 2021

Binary Size Report

Affected SDKs

  • firebase-crashlytics-ndk

    Type Base (ab6bab4) Head (e42ab4ff) Diff
    aar 1.68 MB 1.68 MB -443 B (-0.0%)
    apk (aggressive / x86_64) 1.44 MB 1.44 MB -4.10 kB (-0.3%)
    apk (release / x86_64) 2.07 MB 2.07 MB -4.10 kB (-0.2%)
  • firebase-firestore

    Type Base (ab6bab4) Head (e42ab4ff) Diff
    aar 1.23 MB 1.23 MB +2.89 kB (+0.2%)
    apk (release) 3.32 MB 3.33 MB +1.17 kB (+0.0%)

Test Logs

Notes

Head commit (e42ab4ff) is created by Prow via merging commits: ab6bab4 72489fc.

@schmidt-sebastian
Copy link
Contributor Author

/test check-changed

Copy link

@thebrianchen thebrianchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, with a question about range exclusivity.

*/
public static IndexOffset create(SnapshotVersion readTime) {
long successorSeconds = readTime.getTimestamp().getSeconds();
int successorNanos = readTime.getTimestamp().getNanoseconds() + 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're doing a > comparison in the RemoteDocumentCache when fetching readTime, we don't want to increment here right? It makes sense to increment if we're doing a >= comparison.

Storing the last read time and using > could also remove the nano increment logic below.

Alternatively, you can keep the range-exclusive behavior and remove the increment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment:

      // We want to create an offset that matches all documents with a read time greater than
      // the provided read time. To do so, we technically need to create an offset for
      // `(readTime, MAX_DOCUMENT_KEY)`. While we could use Unicode codepoints to generate
      // MAX_DOCUMENT_KEY, it is much easier to use `(readTime + 1, DocumentKey.empty())` since
      // `> DocumentKey.empty()` matches all valid document IDs.

documents.isEmpty()
? remoteDocumentCache.getLatestReadTime()
: documents.get(documents.size() - 1).getReadTime();
? IndexOffset.NONE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the latest read time here instead of IndexOffset.NONE? This way we can continue from the current read time on the next backfill iteration.

optional: add test that verifies we store/use the latest read time even if no documents were found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, now I wonder why I did not do this right away. We should use the latest read time here for sure.

List<MutableDocument> expected = asList(doc("b/new", 3, docData));
assertEquals(expected, values(results));
}

@Test
public void testDocumentsMatchingQuerySinceReadTimeAndDocumentKey() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: add test for nanoseconds range-exclusivity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be different from testDocumentsMatchingQuerySinceReadTime()? If so, can you provide a rough sketch?

Copy link

@thebrianchen thebrianchen Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking you could modify or add the test to check for the nanosecond tiebreaker:

@Test
public void testDocumentsMatchingQuerySinceReadTimeAndDocumentKey() {
  Map<String, Object> docData = map("data", 2);
  addTestDocumentAtPath("b/a", /* updateTime= */ 1, /* readTime= */ version(11,1));
  addTestDocumentAtPath("b/b", /* updateTime= */ 2, /*  readTime= = */ version(11,2));
  addTestDocumentAtPath("b/c", /* updateTime= */ 3, /*  readTime= = */ version(11,3));
  addTestDocumentAtPath("b/d", /* updateTime= */ 4, /*  readTime= = */ version(12,2));

  Query query = Query.atPath(path("b"));
  ImmutableSortedMap<DocumentKey, MutableDocument> results =
      remoteDocumentCache.getAllDocumentsMatchingQuery(
          query, IndexOffset.create(version(11,2)));
  List<MutableDocument> expected = asList(doc("b/c", 3, docData), doc("b/d", 4, docData));
  assertEquals(expected, values(results));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that makes a lot of sense. Added one more test.

Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Couple follow up questions.

documents.isEmpty()
? remoteDocumentCache.getLatestReadTime()
: documents.get(documents.size() - 1).getReadTime();
? IndexOffset.NONE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, now I wonder why I did not do this right away. We should use the latest read time here for sure.

*/
public static IndexOffset create(SnapshotVersion readTime) {
long successorSeconds = readTime.getTimestamp().getSeconds();
int successorNanos = readTime.getTimestamp().getNanoseconds() + 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment:

      // We want to create an offset that matches all documents with a read time greater than
      // the provided read time. To do so, we technically need to create an offset for
      // `(readTime, MAX_DOCUMENT_KEY)`. While we could use Unicode codepoints to generate
      // MAX_DOCUMENT_KEY, it is much easier to use `(readTime + 1, DocumentKey.empty())` since
      // `> DocumentKey.empty()` matches all valid document IDs.

List<MutableDocument> expected = asList(doc("b/new", 3, docData));
assertEquals(expected, values(results));
}

@Test
public void testDocumentsMatchingQuerySinceReadTimeAndDocumentKey() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be different from testDocumentsMatchingQuerySinceReadTime()? If so, can you provide a rough sketch?

@schmidt-sebastian schmidt-sebastian removed their assignment Nov 29, 2021
@Test
public void indexOffsetAdvancesSeconds() {
IndexOffset actualSuccessor = IndexOffset.create(version(1, (int) 1e9 - 1));
IndexOffset expectedSuccessor = IndexOffset.create(version(2, 0), DocumentKey.empty());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ultranit: no need to specify DocumentKey.empty()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to specify DocumentKey.empty() or else we create a key with version(3,0), DocumentKey.empty().

@schmidt-sebastian schmidt-sebastian merged commit d0048fd into master Nov 30, 2021
@schmidt-sebastian schmidt-sebastian deleted the mrschmidt/offset branch November 30, 2021 00:12
@firebase firebase locked and limited conversation to collaborators Dec 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants