-
Notifications
You must be signed in to change notification settings - Fork 624
Index-Free (2/6): Add update_time to SQLRemoteDocumentCache schema #615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -36,6 +37,12 @@ public Timestamp getTimestamp() { | |||
return timestamp; | |||
} | |||
|
|||
/** Returns the microseconds since EPOCH that this snapshot version represents. */ | |||
public long toMicroseconds() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically we've tried to avoid baking in this assumption about backend timestamp resolution into our system. This is because the backend currently truncates this time but they have an open bug for fixing this.
Meanwhile we've already established a precedent for how to represent timestamps literally in our schema. For example, the targets table includes the following:
+ "snapshot_version_seconds INTEGER, "
+ "snapshot_version_nanos INTEGER, "
It seems like we lose nothing by representing these accurately--even if we only query on seconds we'll be fine for the purposes of index-free querying--so it seems like we should follow the established precedent and represent timestamps at full resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this PR to use the same seconds/nanos approach as the targets table. I didn't want to do this originally cause it would result in some overselection, but I assume that you are correct when you say that we don't need to worry about this here.
@@ -351,6 +355,16 @@ private void addSequenceNumber() { | |||
} | |||
} | |||
|
|||
private void addUpdateTime() { | |||
if (!tableContainsColumn("remote_documents", "snapshot_version_seconds")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are multiple versions of a document, it'd probably be better to follow the name used on the backend and call this the update_time_seconds
-- that way it's clear what the correspondence is with the proto fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Future me will thank you for this :)
"Table contained snapshot_version_seconds, but is missing snapshot_version_nanos"); | ||
db.execSQL("ALTER TABLE remote_documents ADD COLUMN snapshot_version_seconds INTEGER"); | ||
db.execSQL("ALTER TABLE remote_documents ADD COLUMN snapshot_version_nanos INTEGER"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all the documents in the cache have a zero version doesn't that mean the query to find recently updated documents has to also check for zero? Doesn't that nearly nullify the benefits of index-free on any device that has pre-existing data?
Consider an existing device containing two targets, A updated at T1 matching D1 and D2. There's also a second target, listened to later: B updated at T2 matching D3. D3 actually matches the query for target A. All of this happened before this schema upgrade, so after the upgrade they'll all have update_time of zero.
If the user queries on A again we need to find D3, but it's going to have a zero update_time.
I think we're better off either:
- actually populating update_time based on the value from inside the document.
- dropping the contents of the remote_documents table
Though the latter is pretty unfriendly so I think we should only consider it if all other possibilities are exhausted.
Alternatively, maybe you have something up your sleeve to handle this case that I'm not seeing :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to describe my strategy for this in the PR comment, but added it as a comment in the code itself for more visibility. Basically - I haven't quite decided what the best way forward it as migrating the contents of the entire RemoteDocumentCache could be quite slow. I decided to punt on this till later (when we actually want to enable this).
/test device-check-changed |
1 similar comment
/test device-check-changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is the second of 6 PRs for "index-free" querying.
This PR modifies the SQL schema to add the update time to the RemoteDocumentCache. There is currently no migration that populates the information - instead, the consuming code will handle NULL value.
We can discuss a migration strategy later.