-
Notifications
You must be signed in to change notification settings - Fork 624
Add mutation batch id to index backfill #3202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
|
||
/** Updates the provided remote documents to include documents in the overlay. */ | ||
private ImmutableSortedMap<DocumentKey, MutableDocument> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried refactoring out shared code between getDocumentsMatchingCollectionQueryFromOverlayCache
and getDocumentsCollectionQueryWithBatchId
, but I think it can probably be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are probably at a point where we should just remove the non-overlay code.
Coverage Report 1Affected ProductsNo changes between base commit (46b3dfd) and merge commit (b025f3a).Test Logs
Notes |
Size Report 1Affected Products
Test Logs
Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some early feedback (did not look at test). Looks pretty good, but we may want to think about how we can remove the need to read documents multiple times.
See also for some changes that may affect this: #3193
firebase-firestore/src/main/java/com/google/firebase/firestore/index/IndexEntry.java
Outdated
Show resolved
Hide resolved
*/ | ||
public static <T extends Comparable<T>> void diffCollections( | ||
SortedSet<T> before, SortedSet<T> after, Consumer<T> onAdd, Consumer<T> onRemove) { | ||
diffCollections(before.iterator(), after.iterator(), before.comparator(), onAdd, onRemove); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overload is actually different from the one we now default to. This method takes sorted data - the method you are calling now reorders the data (and copies it) even when already sorted. I think you solve the problem you encountered - that SortedSet<T>.comparator()
can return null
- by creating a default comparator based on T.compareTo()
if the set's comparator is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks. Added the code back and added a null check.
* change past `sinceBatchId` are returned. | ||
*/ | ||
Map<DocumentKey, Pair<Integer, Mutation>> getOverlaysWithBatchId( | ||
ResourcePath collection, int sinceBatchId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised we don't have a high-level class that contains both the mutation and the batch ID for Overlays. We should probably create one and combine this method with the existing method just above. (cc @wu-hui)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed this method into the original getOverlays()
and changed logic to support Pair
.
|
||
// Start indexing mutations only after catching up for read time. This must be done separately | ||
// since the backfill's first pass is in read time order rather than mutation batch id order. | ||
int documentsRemaining = documentsRemainingUnderCap - oldestDocuments.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to a helper method for better encapsulation? We might also want to do the same for the RemoteDocumentCache code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved into LocalDocumentsView
// order by batch id. | ||
int newLargestBatchId = | ||
documentToBatchId.get(documentsByBatchId.get(documentsByBatchId.size() - 1)); | ||
newOffset = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this into getNewOffset
? (as a final call at the end of the method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N/A after moving into LocalDocumentsView.
* corresponding batch id. | ||
*/ | ||
// TODO(indexing): Support adding local mutations to collection group table. | ||
public Map<Document, Integer> getDocumentsBatchIdsMatchingCollectionGroupQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code would be cleaner if this method returned Map<BatchId, List<Document>>
.
The name is also quite a mouthful - and the method should probably just accept a String collectionGroup
rather than a query as it ignores all other query constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but cannot think of a cleaner name.
firebase-firestore/src/main/java/com/google/firebase/firestore/local/IndexBackfiller.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
/** Updates the provided remote documents to include documents in the overlay. */ | ||
private ImmutableSortedMap<DocumentKey, MutableDocument> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are probably at a point where we should just remove the non-overlay code.
@@ -105,4 +105,32 @@ public void removeOverlaysForBatchId(int batchId) { | |||
|
|||
return result; | |||
} | |||
|
|||
@Override | |||
public Map<DocumentKey, Pair<Integer, Mutation>> getOverlaysWithBatchId( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please find a way to remove this duplication - even if we do not add an Overlay class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combined the two methods into a single one.
@@ -111,4 +112,34 @@ public void removeOverlaysForBatchId(int batchId) { | |||
|
|||
return result; | |||
} | |||
|
|||
@Override | |||
public Map<DocumentKey, Pair<Integer, Mutation>> getOverlaysWithBatchId( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…/index/IndexEntry.java Co-authored-by: Sebastian Schmidt <[email protected]>
…-sdk into bc/mutation-id
documentsProcessed = backfiller.backfill(); | ||
assertEquals(1, documentsProcessed); | ||
SortedSet<IndexEntry> after = indexManager.getExistingIndexEntries(docA.getKey(), fieldIndex); | ||
assertNotEquals(before.first().getDirectionalValue(), after.first().getDirectionalValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of relying on getExistingIndexEntries, can you just the public API and invoke getDocumentsMatchingTarget
with two different targets (foo > 10 and foo > 20 for example). Before the update, you can verify that one of them matches and then after the update two match.
Same comment applies below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look great - but we probably do not need to use the private API.
324714c
to
add5707
Compare
Main changes:
/test check-changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge step in the right direction. We should talk about the refactor to add (and use) the BatchId in the Document class. I see that you added the code to populate it - maybe it will now be easy enough to also use it? It would allow you to stop passing Pair<integer, document>
types.
@@ -47,11 +48,14 @@ | |||
void removeOverlaysForBatchId(int batchId); | |||
|
|||
/** | |||
* Returns all saved overlays for the given collection. | |||
* Returns a mapping of overlays containing the largest batch id for each overlay mutation in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably revert this documentation change now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* | ||
* @param collection The collection path to get the overlays for. | ||
* @param sinceBatchId The minimum batch ID to filter by (exclusive). Only overlays that contain a | ||
* change past `sinceBatchId` are returned. | ||
* @return Mapping of each document key in the collection to its largest batch id and overlay | ||
* mutation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
processedCollectionGroups.add(collectionGroup); | ||
} | ||
return maxDocumentsToProcess - documentsRemaining; | ||
} | ||
|
||
/** Writes entries for the fetched field indexes. */ | ||
/** | ||
* Writes entries for the corresponding field indexes in the provided collection group. Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update as their are no "corresponding" indexes (at least none are passed to the method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
firebase-firestore/src/main/java/com/google/firebase/firestore/local/IndexBackfiller.java
Show resolved
Hide resolved
lowestOffset = lowestOffset == null ? IndexOffset.NONE : lowestOffset; | ||
|
||
// Add earliest batch id to offset | ||
int earliestBatchId = getEarliestBatchId(fieldIndexes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you inline this call? You are iterating over the list of field indexes twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
// Apply overlays to empty mutable document map to generate base. | ||
for (Map.Entry<DocumentKey, Overlay> entry : overlays.entrySet()) { | ||
remoteDocuments.put(entry.getKey(), MutableDocument.newInvalidDocument(entry.getKey())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need to fetch the documents from the RDC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the document overlay cache contains the most up to date version of the documents, we don't need to.
mutation.applyToLocalView(docEntry.getValue(), null, batchId, Timestamp.now()); | ||
} | ||
// Finally, insert the documents that still match the query. | ||
if (query.matches(docEntry.getValue())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this once at the end of getNextDocumentsAndOffset
.
FWIW, we might not need to do this at all if we pass a collection group here instead of the query. None of the IndexBackfill methods need a query.
return getDocumentKey().compareTo(other.getDocumentKey()); | ||
cmp = getDocumentKey().compareTo(other.getDocumentKey()); | ||
if (cmp != 0) return cmp; | ||
return Integer.signum(getLargestBatchId() - other.getLargestBatchId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer.compare? (for noobs like me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -182,6 +184,10 @@ public SnapshotVersion getReadTime() { | |||
return readTime; | |||
} | |||
|
|||
public int getMutationBatchID() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem used but I wish it was :)
* <p>Holds information about a mutation and the largest batch id in Firestore when the mutation was | ||
* created. | ||
*/ | ||
public class Overlay { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to do one of the following:
- Make this an AutoValue
- Make this a Kotlin value class
- Add equals/hashCode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned it into an AutoValue.
/** Returns the offset for the index based on the newly indexed documents. */ | ||
private IndexOffset getNewOffset( | ||
ImmutableSortedMap<DocumentKey, Document> documents, IndexOffset currentOffset) { | ||
if (documents.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW This should eventually live in the IndexBackfiller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, this probably does not and we need to leave it here.
@thebrianchen: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments are now outdated + I don't really remember their original context. Please review this as a rough draft. Will clean up the comments later.
* | ||
* @param collection The collection path to get the overlays for. | ||
* @param sinceBatchId The minimum batch ID to filter by (exclusive). Only overlays that contain a | ||
* change past `sinceBatchId` are returned. | ||
* @return Mapping of each document key in the collection to its largest batch id and overlay | ||
* mutation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -47,11 +48,14 @@ | |||
void removeOverlaysForBatchId(int batchId); | |||
|
|||
/** | |||
* Returns all saved overlays for the given collection. | |||
* Returns a mapping of overlays containing the largest batch id for each overlay mutation in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
processedCollectionGroups.add(collectionGroup); | ||
} | ||
return maxDocumentsToProcess - documentsRemaining; | ||
} | ||
|
||
/** Writes entries for the fetched field indexes. */ | ||
/** | ||
* Writes entries for the corresponding field indexes in the provided collection group. Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
firebase-firestore/src/main/java/com/google/firebase/firestore/local/IndexBackfiller.java
Show resolved
Hide resolved
lowestOffset = lowestOffset == null ? IndexOffset.NONE : lowestOffset; | ||
|
||
// Add earliest batch id to offset | ||
int earliestBatchId = getEarliestBatchId(fieldIndexes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
*/ | ||
public Map<Document, Integer> getDocumentsInOverlayCacheByBatchId( | ||
Query query, IndexOffset offset) { | ||
Map<DocumentKey, MutableDocument> remoteDocuments = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
// Apply overlays to empty mutable document map to generate base. | ||
for (Map.Entry<DocumentKey, Overlay> entry : overlays.entrySet()) { | ||
remoteDocuments.put(entry.getKey(), MutableDocument.newInvalidDocument(entry.getKey())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the document overlay cache contains the most up to date version of the documents, we don't need to.
overlay.applyToLocalView(docEntry.getValue(), null, Timestamp.now()); | ||
Overlay overlay = overlays.get(docEntry.getKey()); | ||
Mutation mutation = overlay != null ? overlay.getMutation() : null; | ||
if (mutation != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
if (!results.containsKey(batchId)) { | ||
List<Document> documents = new ArrayList<>(); | ||
documents.add(document); | ||
results = results.insert(batchId, documents); | ||
} else { | ||
results.get(batchId).add(document); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* <p>Holds information about a mutation and the largest batch id in Firestore when the mutation was | ||
* created. | ||
*/ | ||
public class Overlay { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned it into an AutoValue.
Better version being implemented in #3323 |
This PR adds mutation batch id support to the index backfiller. All documents in the same mutation batch are processed, even if it goes over the limit (the alternative is to add an offset for mutations, but batches are limited to 500 writes).
Still todo: