Skip to content

Index-Free (2/6): Add update_time to SQLRemoteDocumentCache schema #615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 23, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import static com.google.firebase.firestore.util.Assert.fail;
import static com.google.firebase.firestore.util.Assert.hardAssert;

import com.google.firebase.Timestamp;
import com.google.firebase.database.collection.ImmutableSortedMap;
import com.google.firebase.firestore.core.Query;
import com.google.firebase.firestore.model.Document;
Expand Down Expand Up @@ -51,13 +52,18 @@ final class SQLiteRemoteDocumentCache implements RemoteDocumentCache {
@Override
public void add(MaybeDocument maybeDocument) {
String path = pathForKey(maybeDocument.getKey());
Timestamp timestamp = maybeDocument.getVersion().getTimestamp();
MessageLite message = serializer.encodeMaybeDocument(maybeDocument);

statsCollector.recordRowsWritten(STATS_TAG, 1);

db.execute(
"INSERT OR REPLACE INTO remote_documents (path, contents) VALUES (?, ?)",
"INSERT OR REPLACE INTO remote_documents "
+ "(path, update_time_seconds, update_time_nanos, contents) "
+ "VALUES (?, ?, ?, ?)",
path,
timestamp.getSeconds(),
timestamp.getNanoseconds(),
message.toByteArray());

db.getIndexManager().addToCollectionParentIndex(maybeDocument.getKey().getPath().popLast());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,16 @@ class SQLiteSchema {
/**
* The version of the schema. Increase this by one for each migration added to runMigrations
* below.
*
* <p>TODO(index-free): The migration to schema version 9 doesn't backfill `update_time` as this
* requires rewriting the RemoteDocumentCache. For index-free queries to efficiently handle
* existing documents, we still need to populate update_time for all existing entries, drop the
* RemoteDocumentCache or ask users to invoke `clearPersistence()` manually. If we decide to
* backfill or drop the contents of the RemoteDocumentCache, we need to perform an additional
* schema migration.
*/
static final int VERSION = 8;
static final int VERSION = 9;

// Remove this constant and increment VERSION to enable indexing support
static final int INDEXING_SUPPORT_VERSION = VERSION + 1;

Expand Down Expand Up @@ -127,6 +135,10 @@ void runMigrations(int fromVersion, int toVersion) {
createV8CollectionParentsIndex();
}

if (fromVersion < 9 && toVersion >= 9) {
addUpdateTime();
}

/*
* Adding a new migration? READ THIS FIRST!
*
Expand Down Expand Up @@ -351,6 +363,16 @@ private void addSequenceNumber() {
}
}

private void addUpdateTime() {
if (!tableContainsColumn("remote_documents", "update_time_seconds")) {
hardAssert(
!tableContainsColumn("remote_documents", "update_time_nanos"),
"Table contained update_time_seconds, but is missing update_time_nanos");
db.execSQL("ALTER TABLE remote_documents ADD COLUMN update_time_seconds INTEGER");
db.execSQL("ALTER TABLE remote_documents ADD COLUMN update_time_nanos INTEGER");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the documents in the cache have a zero version doesn't that mean the query to find recently updated documents has to also check for zero? Doesn't that nearly nullify the benefits of index-free on any device that has pre-existing data?

Consider an existing device containing two targets, A updated at T1 matching D1 and D2. There's also a second target, listened to later: B updated at T2 matching D3. D3 actually matches the query for target A. All of this happened before this schema upgrade, so after the upgrade they'll all have update_time of zero.

If the user queries on A again we need to find D3, but it's going to have a zero update_time.

I think we're better off either:

  • actually populating update_time based on the value from inside the document.
  • dropping the contents of the remote_documents table

Though the latter is pretty unfriendly so I think we should only consider it if all other possibilities are exhausted.

Alternatively, maybe you have something up your sleeve to handle this case that I'm not seeing :-).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to describe my strategy for this in the PR comment, but added it as a comment in the code itself for more visibility. Basically - I haven't quite decided what the best way forward it as migrating the contents of the entire RemoteDocumentCache could be quite slow. I decided to punt on this till later (when we actually want to enable this).

}

/**
* Ensures that each entry in the remote document cache has a corresponding sentinel row. Any
* entries that lack a sentinel row are given one with the sequence number set to the highest
Expand Down