Skip to content

Use Read-Time Index for Multi-Tab #2128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

schmidt-sebastian
Copy link
Contributor

@schmidt-sebastian schmidt-sebastian commented Aug 30, 2019

This uses the read time we persist for Index-Free queries in the change log for multi-tab. The big benefit is that we no longer need to GC the change log (since LRU GC does it for us) and we no longer need to recover from GCed change logs. The downside is that we now need to store NoDocuments for deleted rows, but these are internal to the IndexedDbRemoteDocumentStore (I added a new option to the RemoteDocumentChangeBuffer for that).

This is a follow-up to #2125

@@ -1144,7 +1144,7 @@ describeSpec('Listens:', [], () => {
.client(1)
.stealPrimaryLease()
.expectListen(query, 'resume-token-2000')
.watchAcksFull(query, 2000, docC)
.watchAcksFull(query, 3000, docC)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed since we can use the read time to get a change diff, rather than a separate counter.

schmidt-sebastian added a commit that referenced this pull request Aug 30, 2019
This should fix the CI failure in #2128
schmidt-sebastian added a commit that referenced this pull request Aug 30, 2019
This should fix the CI failure in #2128
mikelehen pushed a commit that referenced this pull request Aug 30, 2019
This should fix the CI failure in #2128
Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM but a few nits and I think we should sanity-check with @wilhuff that we're okay with web storing sentinel delete documents that iOS / Android do not.

@@ -1237,7 +1211,9 @@ export class IndexedDbLruDelegate implements ReferenceDelegate, LruDelegate {
upperBound: ListenSequenceNumber
): PersistencePromise<number> {
const documentCache = this.db.getRemoteDocumentCache();
const changeBuffer = documentCache.newChangeBuffer();
const changeBuffer = documentCache.newChangeBuffer({
createSentinelDocumentsToTrackDeletes: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps change "Deletes" => "Removals" or "RemovedEntries" since this corresponds to removeEntry() calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (trackRemovals as per later comment)

@@ -486,15 +447,22 @@ export class IndexedDbRemoteDocumentCache implements RemoteDocumentCache {
promises.push(this.documentCache.addEntry(transaction, key, doc));
} else {
sizeDelta -= previousSize!;
promises.push(this.documentCache.removeEntry(transaction, key));
if (this.trackDeletes) {
// A sentinel delete is symbolized by a NoDocument with a version of 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe expand on this comment?

In order to track deletes, we store "sentinel delete" entries in the Remote Document Cache. These entries are represented by a NoDocument with a version of 0 and are ignored by maybeDecodeDocument() but preserved in getNewDocumentChanges().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the comment :)

newChangeBuffer(): RemoteDocumentChangeBuffer {
return new IndexedDbRemoteDocumentCache.RemoteDocumentChangeBuffer(this);
newChangeBuffer(options: {
createSentinelDocumentsToTrackDeletes: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You say "sentinelDocuments" here but refer to them as "sentinel deletes" elsewhere. Please use consistent terminology.
  2. We have this long name here but just trackDeletes in the RemoteDocumentChangeBuffer constructor. I am not sure which I prefer, but I can't think of a great reason for them to be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you seem to be ok with the shorter name, I opted for "trackRemovals" everywhere.

DbClientMetadata.store,
DbRemoteDocumentChanges.store
];
export const V4_STORES = [...V3_STORES, DbClientMetadata.store];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... I'm having uneasy feelings about revising history like this. I wonder if we should keep some hint about the prior existence of this store to keep our history of schema changes in-tact in this file. I'm not sure though. It also seems strange to keep dead code. Maybe ask @wilhuff what he thinks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our offline discussion, let's keep this as is. I added a comment about the prior existence of this store. Theoretically though it should not matter since we drop the store, so we can recreate it with the same name and use for a different purpose later.

@@ -345,7 +345,9 @@ export class LocalStore {
'readwrite-primary',
txn => {
const affected = batchResult.batch.keys();
const documentBuffer = this.remoteDocuments.newChangeBuffer();
const documentBuffer = this.remoteDocuments.newChangeBuffer({
createSentinelDocumentsToTrackDeletes: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me wonders if this should be the default behavior and GC should explicitly opt-out (e.g. flip the logic and rename to suppressSentinelDeleteDocuments or something) since reading this code, I would have trouble guessing what this is for and why it is enabled in this code path. By contrast, the GC code could have a comment explaining why we don't want to track deletes in that case.

Alternatively, since this option probably corresponds 100% to whether this is GC code or not, we could consider specializing the name to match (e.g. forGarbageCollection: true or something)

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little torn on this. I think the current naming makes the behavior more explicit (compared to forGarbageCollection) and I am not a big fan of flipping the behavior since we are turning on an extra feature after all. I do see your point though and can probably be won over somewhat easily.

For now, I made the argument optional and added a comment in the two places that requires us to turn trackRemovals on.

@@ -177,7 +178,9 @@ export class MemoryRemoteDocumentCache implements RemoteDocumentCache {
return PersistencePromise.resolve(changedDocs);
}

newChangeBuffer(): RemoteDocumentChangeBuffer {
newChangeBuffer(options: {
createSentinelDocumentsToTrackDeletes: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps there should be a comment indicating we explicitly ignore this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. This makes me thing we should never have added the change log to the MemoryRemoteDocumentCache.

}
)
.next(() => {
// The old documents are not included since readTime was not populated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if this test added some documents with index entries but that were older than version=1 to verify that are not included either...

Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted this into a separate test ("can get recent document changes") that does.

@schmidt-sebastian schmidt-sebastian changed the base branch from mrschmidt/persistreadtime to mrschmidt/indexfree-master September 4, 2019 19:07
@schmidt-sebastian schmidt-sebastian force-pushed the mrschmidt/multitabreadtime branch from fc9ed49 to 8142123 Compare September 4, 2019 19:09
@schmidt-sebastian
Copy link
Contributor Author

schmidt-sebastian commented Sep 4, 2019

I used "Sebastian's guide to screw with your commit history for this PR". The review changes are part of 8142123

Ugh, I totally screwed this up. Do you mind reviewing the full diff again? I also had to add the readTime as an argument for removeEntry() in order to serialize a proper NoDocument event.

Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM but in order to be consistent with my previous feedback, I'd like to see if we can remove the version number from removeEntry()... again. 😬

changeBuffer.removeEntry(
docKey,
SnapshotVersion.forDeletedDoc()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per chat conversation, I will not stand for removeEntry() requiring a version that is MEANINGLESS!!! j/k... But I am realizing that there is exactly one place where we want to track removed entries and that is for failed limbo resolution, so I'm wondering if we should:

  1. Remove the trackRemovals option.
  2. Make it so removeEntry() doesn't require a version.
  3. Let you optionally pass an entry in. This could be removeEntry(key, version) or if we want to make it a little more explicit (and borrow a bit from the existing shape of the API), maybe removeEntry(key, {trackRemovalVersion: version}) or something.

Alternatively, since this such a specific case, we could consider not storing the sentinel delete and inventing a localStorage notification for it instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, this didn't quite work out in practice :/ If I do want to do this, then I have to track the readTime per document. If our semantic are going to be "when a read time is provided, create a fake NoDocument" then we should allow for the fact that some removals have readTimes while others do not.

I still made the second argument optional. If no read time is provided when it is needed, the assert in get readTime() should provide some insight.

@@ -131,24 +110,18 @@ export class IndexedDbRemoteDocumentCache implements RemoteDocumentCache {
}

/**
* Updates the document change log and adds the given delta to the cached current size.
* Updates the current size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "current cache size" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

newChangeBuffer(): RemoteDocumentChangeBuffer {
return new IndexedDbRemoteDocumentCache.RemoteDocumentChangeBuffer(this);
newChangeBuffer(options?: {
trackRemovals: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per feedback elsewhere, I recommend dropping this option and "moving" it to the removeEntry() call.

@@ -617,6 +630,12 @@ export class DbRemoteDocument {

static collectionReadTimeIndexPath = ['parentPath', 'readTime'];

// TODO: We are currently storing full document keys almost three times
// (once as part of the primary key, once - partly - as `parentPath` and once
// inside the encoded documents). During our next migration, we should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store it 3 more times if you include the indexes (collectionReadTimeIndex stores it twice, readTimeIndex stores it once). Hopefully people don't use deeply-nested subcollections and small documents!

It occurs to me that we could maintain our own collectionReadTimeIndex that instead of being a true index was just an object store containing {collectionPath, readTime, documentId} objects. This would save us storing the full documentKey. It's probably not worth doing for web since it'd be less efficient than a real index, but with LevelDB (where we have to implement our own indexes anyway), it probably makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much redundancy! So much uptime!

I am planning on using that format for LevelDB, but I prefer using existing functionality here even if it is slightly less storage efficient.

Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after merging #2145

@schmidt-sebastian schmidt-sebastian merged commit 06e6c50 into mrschmidt/indexfree-master Sep 5, 2019
@firebase firebase locked and limited conversation to collaborators Oct 8, 2019
@schmidt-sebastian schmidt-sebastian deleted the mrschmidt/multitabreadtime branch October 10, 2019 22:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants