Skip to content

Index-Free (2/6): Add update_time to SQLRemoteDocumentCache schema #615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 23, 2019

Conversation

schmidt-sebastian
Copy link
Contributor

This is the second of 6 PRs for "index-free" querying.

This PR modifies the SQL schema to add the update time to the RemoteDocumentCache. There is currently no migration that populates the information - instead, the consuming code will handle NULL value.

We can discuss a migration strategy later.

@schmidt-sebastian schmidt-sebastian changed the title Add update_time to SQLRemoteDocumentCache schema Index-Free (2/6): Add update_time to SQLRemoteDocumentCache schema Jul 15, 2019
@@ -36,6 +37,12 @@ public Timestamp getTimestamp() {
return timestamp;
}

/** Returns the microseconds since EPOCH that this snapshot version represents. */
public long toMicroseconds() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically we've tried to avoid baking in this assumption about backend timestamp resolution into our system. This is because the backend currently truncates this time but they have an open bug for fixing this.

Meanwhile we've already established a precedent for how to represent timestamps literally in our schema. For example, the targets table includes the following:

                  + "snapshot_version_seconds INTEGER, "
                  + "snapshot_version_nanos INTEGER, "

It seems like we lose nothing by representing these accurately--even if we only query on seconds we'll be fine for the purposes of index-free querying--so it seems like we should follow the established precedent and represent timestamps at full resolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this PR to use the same seconds/nanos approach as the targets table. I didn't want to do this originally cause it would result in some overselection, but I assume that you are correct when you say that we don't need to worry about this here.

@wilhuff wilhuff assigned schmidt-sebastian and unassigned wilhuff Jul 18, 2019
@@ -351,6 +355,16 @@ private void addSequenceNumber() {
}
}

private void addUpdateTime() {
if (!tableContainsColumn("remote_documents", "snapshot_version_seconds")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are multiple versions of a document, it'd probably be better to follow the name used on the backend and call this the update_time_seconds -- that way it's clear what the correspondence is with the proto fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Future me will thank you for this :)

"Table contained snapshot_version_seconds, but is missing snapshot_version_nanos");
db.execSQL("ALTER TABLE remote_documents ADD COLUMN snapshot_version_seconds INTEGER");
db.execSQL("ALTER TABLE remote_documents ADD COLUMN snapshot_version_nanos INTEGER");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the documents in the cache have a zero version doesn't that mean the query to find recently updated documents has to also check for zero? Doesn't that nearly nullify the benefits of index-free on any device that has pre-existing data?

Consider an existing device containing two targets, A updated at T1 matching D1 and D2. There's also a second target, listened to later: B updated at T2 matching D3. D3 actually matches the query for target A. All of this happened before this schema upgrade, so after the upgrade they'll all have update_time of zero.

If the user queries on A again we need to find D3, but it's going to have a zero update_time.

I think we're better off either:

  • actually populating update_time based on the value from inside the document.
  • dropping the contents of the remote_documents table

Though the latter is pretty unfriendly so I think we should only consider it if all other possibilities are exhausted.

Alternatively, maybe you have something up your sleeve to handle this case that I'm not seeing :-).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to describe my strategy for this in the PR comment, but added it as a comment in the code itself for more visibility. Basically - I haven't quite decided what the best way forward it as migrating the contents of the entire RemoteDocumentCache could be quite slow. I decided to punt on this till later (when we actually want to enable this).

@schmidt-sebastian schmidt-sebastian changed the base branch from mrschmidt/indexfree-1 to master July 22, 2019 18:18
@schmidt-sebastian
Copy link
Contributor Author

/test device-check-changed

1 similar comment
@schmidt-sebastian
Copy link
Contributor Author

/test device-check-changed

Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wilhuff wilhuff merged commit 6135c4a into master Jul 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes Override cla size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants