-
Notifications
You must be signed in to change notification settings - Fork 624
Use DocumentKey as tie breaker in index backfill #3174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Coverage ReportAffected SDKs
Test Logs
NotesHTML coverage reports can be produced locally with Head commit (e42ab4ff) is created by Prow via merging commits: ab6bab4 72489fc. |
Binary Size ReportAffected SDKs
Test Logs
NotesHead commit (e42ab4ff) is created by Prow via merging commits: ab6bab4 72489fc. |
/test check-changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, with a question about range exclusivity.
firebase-firestore/src/main/java/com/google/firebase/firestore/model/FieldIndex.java
Show resolved
Hide resolved
*/ | ||
public static IndexOffset create(SnapshotVersion readTime) { | ||
long successorSeconds = readTime.getTimestamp().getSeconds(); | ||
int successorNanos = readTime.getTimestamp().getNanoseconds() + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're doing a >
comparison in the RemoteDocumentCache when fetching readTime, we don't want to increment here right? It makes sense to increment if we're doing a >=
comparison.
Storing the last read time and using >
could also remove the nano increment logic below.
Alternatively, you can keep the range-exclusive behavior and remove the increment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment:
// We want to create an offset that matches all documents with a read time greater than
// the provided read time. To do so, we technically need to create an offset for
// `(readTime, MAX_DOCUMENT_KEY)`. While we could use Unicode codepoints to generate
// MAX_DOCUMENT_KEY, it is much easier to use `(readTime + 1, DocumentKey.empty())` since
// `> DocumentKey.empty()` matches all valid document IDs.
documents.isEmpty() | ||
? remoteDocumentCache.getLatestReadTime() | ||
: documents.get(documents.size() - 1).getReadTime(); | ||
? IndexOffset.NONE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the latest read time here instead of IndexOffset.NONE? This way we can continue from the current read time on the next backfill iteration.
optional: add test that verifies we store/use the latest read time even if no documents were found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, now I wonder why I did not do this right away. We should use the latest read time here for sure.
List<MutableDocument> expected = asList(doc("b/new", 3, docData)); | ||
assertEquals(expected, values(results)); | ||
} | ||
|
||
@Test | ||
public void testDocumentsMatchingQuerySinceReadTimeAndDocumentKey() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: add test for nanoseconds range-exclusivity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be different from testDocumentsMatchingQuerySinceReadTime()
? If so, can you provide a rough sketch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking you could modify or add the test to check for the nanosecond tiebreaker:
@Test
public void testDocumentsMatchingQuerySinceReadTimeAndDocumentKey() {
Map<String, Object> docData = map("data", 2);
addTestDocumentAtPath("b/a", /* updateTime= */ 1, /* readTime= */ version(11,1));
addTestDocumentAtPath("b/b", /* updateTime= */ 2, /* readTime= = */ version(11,2));
addTestDocumentAtPath("b/c", /* updateTime= */ 3, /* readTime= = */ version(11,3));
addTestDocumentAtPath("b/d", /* updateTime= */ 4, /* readTime= = */ version(12,2));
Query query = Query.atPath(path("b"));
ImmutableSortedMap<DocumentKey, MutableDocument> results =
remoteDocumentCache.getAllDocumentsMatchingQuery(
query, IndexOffset.create(version(11,2)));
List<MutableDocument> expected = asList(doc("b/c", 3, docData), doc("b/d", 4, docData));
assertEquals(expected, values(results));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, that makes a lot of sense. Added one more test.
firebase-firestore/src/main/java/com/google/firebase/firestore/local/SQLiteSchema.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Couple follow up questions.
documents.isEmpty() | ||
? remoteDocumentCache.getLatestReadTime() | ||
: documents.get(documents.size() - 1).getReadTime(); | ||
? IndexOffset.NONE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, now I wonder why I did not do this right away. We should use the latest read time here for sure.
*/ | ||
public static IndexOffset create(SnapshotVersion readTime) { | ||
long successorSeconds = readTime.getTimestamp().getSeconds(); | ||
int successorNanos = readTime.getTimestamp().getNanoseconds() + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment:
// We want to create an offset that matches all documents with a read time greater than
// the provided read time. To do so, we technically need to create an offset for
// `(readTime, MAX_DOCUMENT_KEY)`. While we could use Unicode codepoints to generate
// MAX_DOCUMENT_KEY, it is much easier to use `(readTime + 1, DocumentKey.empty())` since
// `> DocumentKey.empty()` matches all valid document IDs.
firebase-firestore/src/main/java/com/google/firebase/firestore/model/FieldIndex.java
Show resolved
Hide resolved
List<MutableDocument> expected = asList(doc("b/new", 3, docData)); | ||
assertEquals(expected, values(results)); | ||
} | ||
|
||
@Test | ||
public void testDocumentsMatchingQuerySinceReadTimeAndDocumentKey() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be different from testDocumentsMatchingQuerySinceReadTime()
? If so, can you provide a rough sketch?
firebase-firestore/src/main/java/com/google/firebase/firestore/local/SQLiteSchema.java
Show resolved
Hide resolved
@Test | ||
public void indexOffsetAdvancesSeconds() { | ||
IndexOffset actualSuccessor = IndexOffset.create(version(1, (int) 1e9 - 1)); | ||
IndexOffset expectedSuccessor = IndexOffset.create(version(2, 0), DocumentKey.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ultranit: no need to specify DocumentKey.empty()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need to specify DocumentKey.empty() or else we create a key with version(3,0), DocumentKey.empty()
.
This PR changes the index backfill offset to use readTime+documentKey instead of just readTime. This helps there are more than 50 documents per read time.