-
Notifications
You must be signed in to change notification settings - Fork 948
Collection Group Queries w/ indexing #1440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the whole I think this is great.
One thing that's missing is a recording of the collection-group-parent entries for pending mutations. We need to index those too otherwise these queries won't work while offline. We probably need some tests that verify that they do work while offline.
packages/firestore/src/core/query.ts
Outdated
@@ -176,12 +174,31 @@ export class Query { | |||
); | |||
} | |||
|
|||
/** | |||
* Helper to convert a Collection Group query into a collection query at a | |||
* specific path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like an intrinsically useful operation. Maybe add a comment about what it's useful for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, done.
* path for root-level collections). Index entries can be retrieved via | ||
* getCollectionParents(). | ||
*/ | ||
indexCollectionParent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is slightly confusing because we're using "index" as both a noun and a verb in this context. I initially thought this might be a getter but was confused by the return type. Some other verb might help with that.
Maybe ensureCollectionParentIndexed? addToCollectionParentIndex?
However, see above. Maybe this doesn't need to be so specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. How about addCollectionParentEntry(), trying to be consistent with our RemoteDocumentCache.addEntry() naming? I don't care strongly though.
packages/firestore/src/core/query.ts
Outdated
if (DocumentKey.isDocumentKey(this.path)) { | ||
if (this.collectionGroup !== null) { | ||
return ( | ||
this.collectionGroup === docPath.secondToLastSegment() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"secondToLastSegment" is a wonky thing to have on here.
Maybe add a hasCollectionId(this.collectinoGroup)
method on documentPath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I added the method to DocumentKey, since it wouldn't always apply to ResourcePath.
collectionPath: ResourcePath | ||
): PersistencePromise<void> { | ||
assert(collectionPath.length >= 1, 'Invalid collection path.'); | ||
const collectionId = collectionPath.lastSegment(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to essentially duplicate the MemoryQueryIndexes implementation. Couldn't we just reuse that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maaaaybe?
It is very similar code, but I can't think of a clean way to reuse it that wouldn't involve creating an awkward abstraction just for the sake of code reuse, and I hate doing that. It's tempting to just use MemoryQueryIndexes directly, but I worry that'll be really awkward / confusing in the future when we have more indexes...
It's worth noting that here (unlike MemoryQueryIndexes) we only use the in-memory cache of indexes in the write path (to avoid re-writing indexes we know exist). We don't use it in the read path since we never populate it with existing index entries from disk. So although the data structure is the same, we use it differently.
Anyway, if you have a suggestion let me know, else I'm inclined to just live with the duplication for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up reworking this, extracting MemoryCollectionParentIndex
as a standalone class used by MemoryIndexManager
so that I could reuse it in both the IndexedDbIndexManager
as well as in the schema migration that back-populates indexes.
return collectionParentsStore(transaction) | ||
.loadAll(range) | ||
.next(entries => { | ||
for (const { parent } of entries) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the name of this construct so I can read more about it? (Not asking for any change--just want to understand this better.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object destructuring. https://basarat.gitbooks.io/typescript/docs/destructuring.html
One other thought: could it be useful to have a possibly disabled by default test that exercises collection groups in combination with filters/order by? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick review! I've fixed the small, easy things, and commented on the rest of the feedback. I'll ping you in an hour or so to two to see if we can chat through the unresolved high-level feedback, in particular the relationship of QueryIndexes to the rest of the components.
return collectionParentsStore(transaction) | ||
.loadAll(range) | ||
.next(entries => { | ||
for (const { parent } of entries) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object destructuring. https://basarat.gitbooks.io/typescript/docs/destructuring.html
packages/firestore/src/core/query.ts
Outdated
@@ -176,12 +174,31 @@ export class Query { | |||
); | |||
} | |||
|
|||
/** | |||
* Helper to convert a Collection Group query into a collection query at a | |||
* specific path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, done.
collectionPath: ResourcePath | ||
): PersistencePromise<void> { | ||
assert(collectionPath.length >= 1, 'Invalid collection path.'); | ||
const collectionId = collectionPath.lastSegment(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maaaaybe?
It is very similar code, but I can't think of a clean way to reuse it that wouldn't involve creating an awkward abstraction just for the sake of code reuse, and I hate doing that. It's tempting to just use MemoryQueryIndexes directly, but I worry that'll be really awkward / confusing in the future when we have more indexes...
It's worth noting that here (unlike MemoryQueryIndexes) we only use the in-memory cache of indexes in the write path (to avoid re-writing indexes we know exist). We don't use it in the read path since we never populate it with existing index entries from disk. So although the data structure is the same, we use it differently.
Anyway, if you have a suggestion let me know, else I'm inclined to just live with the duplication for now.
@@ -50,7 +51,10 @@ export class MemoryRemoteDocumentCache implements RemoteDocumentCache { | |||
* @param sizer Used to assess the size of a document. For eager GC, this is expected to just | |||
* return 0 to avoid unnecessarily doing the work of calculating the size. | |||
*/ | |||
constructor(private readonly sizer: DocumentSizer) {} | |||
constructor( | |||
private readonly queryIndexes: QueryIndexes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the fullness of time is very fuzzy to me. I was imagining that we'd have a UnifiedDocumentCache component that would manage mutations and remote documents, and so it could own QueryIndexes and initiate the appropriate index updates (based on old/new values, etc.). So until then, my plan was to just have RemoteDocumentCache and MutationQueue separately do the appropriate index updates (though I forgot the MutationQueue, oops).
But I think this is probably worth talking through in chat or in-person because I don't think I'm fully grokking your suggestion.
import { PersistencePromise } from './persistence_promise'; | ||
|
||
/** | ||
* Represents a set of indexes that are used to execute queries efficiently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comment text. I'm fuzzy on what this interface is going to look like over time though. I imagine we may end up removing indexCollectionParent() and instead have indexDocument(oldDoc, newDoc) or something (and it would implicitly index the collection parent as well).
/** | ||
* Represents a set of indexes that are used to execute queries efficiently. | ||
*/ | ||
export interface QueryIndexes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in love with it either. Your alternatives are the same ones I considered. :-)
I wasn't sure if Indexer was okay, since it also reads indexes (getCollectionParents()) and the rest of our persistence classes are nouns (QueryCache, RemoteDocumentCache, MutationQueue). So I think I'd slightly prefer IndexManager... but I'm kinda' allergic to managers...
I'm opposed to CollectionParentIndex because the file overhead is a pretty big pain and I'm not sure what the future of this all looks like (e.g. see comment about indexDocument(oldDoc, newDoc) above).
Anyway, I'll plan to rename this to IndexManager before final review unless you have a preference for something else.
* path for root-level collections). Index entries can be retrieved via | ||
* getCollectionParents(). | ||
*/ | ||
indexCollectionParent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. How about addCollectionParentEntry(), trying to be consistent with our RemoteDocumentCache.addEntry() naming? I don't care strongly though.
Just to quickly summarize our discussion before we go away for a while:
|
* QueryIndexes => IndexManager * indexCollectionParent() => addToCollectionParentIndex()
@wilhuff I think this is ready for review. I did the renames we talked about, added tests, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A few more nits and we're good to go!
packages/firebase/index.d.ts
Outdated
@@ -886,6 +886,16 @@ declare namespace firebase.firestore { | |||
*/ | |||
doc(documentPath: string): DocumentReference; | |||
|
|||
/** | |||
* Gets a Query instance that will include documents from all collections and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Gets" is the wrong verb here. Elsewhere when talking about creating a new Query object we've written "Creates and returns a new Query".
Also, we've written "@return The created Query" instead of "The Query instance." below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to:
* Creates and returns a new Query that includes all documents in the
* database that are contained in a collection or subcollection with the
* given collectionId.
packages/firebase/index.d.ts
Outdated
* Gets a Query instance that will include documents from all collections and | ||
* subcollections in the database with the given collectionId. | ||
* | ||
* @param collectionId The collectionId specifying the group of collections to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"collectionId" is a term of art that's pretty specific to Firestore. Making this comment self-referential like this makes it more opaque than it needs to be. Could we include text here that indicates that a collectionId is the trailing component of a path to a collection? Also maybe note that collectionIds don't contain slashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to:
* @param collectionId Identifies the collections to query over. Every
* collection or subcollection with this ID as the last segment of its path
* will be included. Cannot contain a slash.
@@ -0,0 +1,50 @@ | |||
/** | |||
* Copyright 2018 Google Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2019 LOL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Represents a set of indexes that are used to execute queries efficiently. | ||
* | ||
* Currently the only index is a [collection id] => [parent path] index, used | ||
* to execute Collection Group queries. When we implement property indexing in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All code is subject to change as time progresses. I realize you're not wild about the fog of war surrounding our eventual goals with indexing seems as if this last sentence isn't adding much value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (query.collectionGroup !== null) { | ||
assert( | ||
path.length % 2 === 0, | ||
'Collection Group queries should be within a document path.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Does the root of the database count as a document path? Maybe "within a document path or root"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
@@ -566,4 +567,124 @@ apiDescribe('Queries', persistence => { | |||
} | |||
}).to.throw(expectedError); | |||
}); | |||
|
|||
it('support collection groups', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reading this as English I would expect this to read "it supports collection groups". Would it be reasonable to make these it('supports ...', ...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh. I think the verb here should match up with the noun in the describe()
function so e.g. in this case the full test case reads "Queries support ..."
Following the example of other tests in this file I've chosen to side-step the issue with it('can query collection groups ...')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Nits resolved. I think this is good to go, pending backend durability.
packages/firebase/index.d.ts
Outdated
@@ -886,6 +886,16 @@ declare namespace firebase.firestore { | |||
*/ | |||
doc(documentPath: string): DocumentReference; | |||
|
|||
/** | |||
* Gets a Query instance that will include documents from all collections and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to:
* Creates and returns a new Query that includes all documents in the
* database that are contained in a collection or subcollection with the
* given collectionId.
packages/firebase/index.d.ts
Outdated
* Gets a Query instance that will include documents from all collections and | ||
* subcollections in the database with the given collectionId. | ||
* | ||
* @param collectionId The collectionId specifying the group of collections to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to:
* @param collectionId Identifies the collections to query over. Every
* collection or subcollection with this ID as the last segment of its path
* will be included. Cannot contain a slash.
@@ -0,0 +1,50 @@ | |||
/** | |||
* Copyright 2018 Google Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Represents a set of indexes that are used to execute queries efficiently. | ||
* | ||
* Currently the only index is a [collection id] => [parent path] index, used | ||
* to execute Collection Group queries. When we implement property indexing in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (query.collectionGroup !== null) { | ||
assert( | ||
path.length % 2 === 0, | ||
'Collection Group queries should be within a document path.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
@@ -566,4 +567,124 @@ apiDescribe('Queries', persistence => { | |||
} | |||
}).to.throw(expectedError); | |||
}); | |||
|
|||
it('support collection groups', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh. I think the verb here should match up with the noun in the describe()
function so e.g. in this case the full test case reads "Queries support ..."
Following the example of other tests in this file I've chosen to side-step the issue with it('can query collection groups ...')
@Feiyang1 Can you approve for the change to packages/firebase/index.d.ts? [note the change is actually commented-out for now because we can't expose the API yet] |
@wilhuff If you have time can you do an initial sanity-check review of this before the holiday? I was hoping to have it 100% ready, but have been too randomized. The feature works, but it's missing the schema migration to populate the index and I need to add more tests. In particular, this PR contains: