Skip to content

Collection Group Queries w/ indexing #1440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Mar 8, 2019

Conversation

mikelehen
Copy link
Contributor

@wilhuff If you have time can you do an initial sanity-check review of this before the holiday? I was hoping to have it 100% ready, but have been too randomized. The feature works, but it's missing the schema migration to populate the index and I need to add more tests. In particular, this PR contains:

  • API surface area for Collection Group queries.
  • New QueryIndexes component to maintain a collectionParents index for CG queries.
  • The plumbing changes to serializer, Query execution, LocalDocumentsView, etc. to make CG queries work with the new index.

Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole I think this is great.

One thing that's missing is a recording of the collection-group-parent entries for pending mutations. We need to index those too otherwise these queries won't work while offline. We probably need some tests that verify that they do work while offline.

@@ -176,12 +174,31 @@ export class Query {
);
}

/**
* Helper to convert a Collection Group query into a collection query at a
* specific path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like an intrinsically useful operation. Maybe add a comment about what it's useful for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done.

* path for root-level collections). Index entries can be retrieved via
* getCollectionParents().
*/
indexCollectionParent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is slightly confusing because we're using "index" as both a noun and a verb in this context. I initially thought this might be a getter but was confused by the return type. Some other verb might help with that.

Maybe ensureCollectionParentIndexed? addToCollectionParentIndex?

However, see above. Maybe this doesn't need to be so specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How about addCollectionParentEntry(), trying to be consistent with our RemoteDocumentCache.addEntry() naming? I don't care strongly though.

if (DocumentKey.isDocumentKey(this.path)) {
if (this.collectionGroup !== null) {
return (
this.collectionGroup === docPath.secondToLastSegment() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"secondToLastSegment" is a wonky thing to have on here.

Maybe add a hasCollectionId(this.collectinoGroup) method on documentPath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I added the method to DocumentKey, since it wouldn't always apply to ResourcePath.

collectionPath: ResourcePath
): PersistencePromise<void> {
assert(collectionPath.length >= 1, 'Invalid collection path.');
const collectionId = collectionPath.lastSegment();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to essentially duplicate the MemoryQueryIndexes implementation. Couldn't we just reuse that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maaaaybe?

It is very similar code, but I can't think of a clean way to reuse it that wouldn't involve creating an awkward abstraction just for the sake of code reuse, and I hate doing that. It's tempting to just use MemoryQueryIndexes directly, but I worry that'll be really awkward / confusing in the future when we have more indexes...

It's worth noting that here (unlike MemoryQueryIndexes) we only use the in-memory cache of indexes in the write path (to avoid re-writing indexes we know exist). We don't use it in the read path since we never populate it with existing index entries from disk. So although the data structure is the same, we use it differently.

Anyway, if you have a suggestion let me know, else I'm inclined to just live with the duplication for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up reworking this, extracting MemoryCollectionParentIndex as a standalone class used by MemoryIndexManager so that I could reuse it in both the IndexedDbIndexManager as well as in the schema migration that back-populates indexes.

return collectionParentsStore(transaction)
.loadAll(range)
.next(entries => {
for (const { parent } of entries) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the name of this construct so I can read more about it? (Not asking for any change--just want to understand this better.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilhuff
Copy link
Contributor

wilhuff commented Dec 21, 2018

One other thought: could it be useful to have a possibly disabled by default test that exercises collection groups in combination with filters/order by?

Copy link
Contributor Author

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review! I've fixed the small, easy things, and commented on the rest of the feedback. I'll ping you in an hour or so to two to see if we can chat through the unresolved high-level feedback, in particular the relationship of QueryIndexes to the rest of the components.

return collectionParentsStore(transaction)
.loadAll(range)
.next(entries => {
for (const { parent } of entries) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -176,12 +174,31 @@ export class Query {
);
}

/**
* Helper to convert a Collection Group query into a collection query at a
* specific path.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done.

collectionPath: ResourcePath
): PersistencePromise<void> {
assert(collectionPath.length >= 1, 'Invalid collection path.');
const collectionId = collectionPath.lastSegment();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maaaaybe?

It is very similar code, but I can't think of a clean way to reuse it that wouldn't involve creating an awkward abstraction just for the sake of code reuse, and I hate doing that. It's tempting to just use MemoryQueryIndexes directly, but I worry that'll be really awkward / confusing in the future when we have more indexes...

It's worth noting that here (unlike MemoryQueryIndexes) we only use the in-memory cache of indexes in the write path (to avoid re-writing indexes we know exist). We don't use it in the read path since we never populate it with existing index entries from disk. So although the data structure is the same, we use it differently.

Anyway, if you have a suggestion let me know, else I'm inclined to just live with the duplication for now.

@@ -50,7 +51,10 @@ export class MemoryRemoteDocumentCache implements RemoteDocumentCache {
* @param sizer Used to assess the size of a document. For eager GC, this is expected to just
* return 0 to avoid unnecessarily doing the work of calculating the size.
*/
constructor(private readonly sizer: DocumentSizer) {}
constructor(
private readonly queryIndexes: QueryIndexes,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the fullness of time is very fuzzy to me. I was imagining that we'd have a UnifiedDocumentCache component that would manage mutations and remote documents, and so it could own QueryIndexes and initiate the appropriate index updates (based on old/new values, etc.). So until then, my plan was to just have RemoteDocumentCache and MutationQueue separately do the appropriate index updates (though I forgot the MutationQueue, oops).

But I think this is probably worth talking through in chat or in-person because I don't think I'm fully grokking your suggestion.

import { PersistencePromise } from './persistence_promise';

/**
* Represents a set of indexes that are used to execute queries efficiently.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comment text. I'm fuzzy on what this interface is going to look like over time though. I imagine we may end up removing indexCollectionParent() and instead have indexDocument(oldDoc, newDoc) or something (and it would implicitly index the collection parent as well).

/**
* Represents a set of indexes that are used to execute queries efficiently.
*/
export interface QueryIndexes {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in love with it either. Your alternatives are the same ones I considered. :-)

I wasn't sure if Indexer was okay, since it also reads indexes (getCollectionParents()) and the rest of our persistence classes are nouns (QueryCache, RemoteDocumentCache, MutationQueue). So I think I'd slightly prefer IndexManager... but I'm kinda' allergic to managers...

I'm opposed to CollectionParentIndex because the file overhead is a pretty big pain and I'm not sure what the future of this all looks like (e.g. see comment about indexDocument(oldDoc, newDoc) above).

Anyway, I'll plan to rename this to IndexManager before final review unless you have a preference for something else.

* path for root-level collections). Index entries can be retrieved via
* getCollectionParents().
*/
indexCollectionParent(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How about addCollectionParentEntry(), trying to be consistent with our RemoteDocumentCache.addEntry() naming? I don't care strongly though.

@wilhuff
Copy link
Contributor

wilhuff commented Dec 22, 2018

Just to quickly summarize our discussion before we go away for a while:

  • Let's leave QueryIndexes as a member of the remote document caches
  • We're doing so ensure that all remote document update paths are caught
  • Rename QueryIndexes to something else
  • Add an index for elements of the mutation queue
  • Add test coverage for mutations
  • Add offline tests that validate CG queries with other criteria execute correctly locally

@wilhuff wilhuff assigned mikelehen and unassigned wilhuff Dec 22, 2018
@mikelehen mikelehen changed the title Collection Group Queries w/ indexing [WIP -- not ready for submission] Collection Group Queries w/ indexing Feb 4, 2019
@mikelehen mikelehen assigned wilhuff and unassigned mikelehen Feb 4, 2019
@mikelehen
Copy link
Contributor Author

@wilhuff I think this is ready for review. I did the renames we talked about, added tests, etc.

Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few more nits and we're good to go!

@@ -886,6 +886,16 @@ declare namespace firebase.firestore {
*/
doc(documentPath: string): DocumentReference;

/**
* Gets a Query instance that will include documents from all collections and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Gets" is the wrong verb here. Elsewhere when talking about creating a new Query object we've written "Creates and returns a new Query".

Also, we've written "@return The created Query" instead of "The Query instance." below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

   * Creates and returns a new Query that includes all documents in the
   * database that are contained in a collection or subcollection with the
   * given collectionId.

* Gets a Query instance that will include documents from all collections and
* subcollections in the database with the given collectionId.
*
* @param collectionId The collectionId specifying the group of collections to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"collectionId" is a term of art that's pretty specific to Firestore. Making this comment self-referential like this makes it more opaque than it needs to be. Could we include text here that indicates that a collectionId is the trailing component of a path to a collection? Also maybe note that collectionIds don't contain slashes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

   * @param collectionId Identifies the collections to query over. Every
   * collection or subcollection with this ID as the last segment of its path
   * will be included. Cannot contain a slash.

@@ -0,0 +1,50 @@
/**
* Copyright 2018 Google Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019 LOL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Represents a set of indexes that are used to execute queries efficiently.
*
* Currently the only index is a [collection id] => [parent path] index, used
* to execute Collection Group queries. When we implement property indexing in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All code is subject to change as time progresses. I realize you're not wild about the fog of war surrounding our eventual goals with indexing seems as if this last sentence isn't adding much value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (query.collectionGroup !== null) {
assert(
path.length % 2 === 0,
'Collection Group queries should be within a document path.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Does the root of the database count as a document path? Maybe "within a document path or root"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

@@ -566,4 +567,124 @@ apiDescribe('Queries', persistence => {
}
}).to.throw(expectedError);
});

it('support collection groups', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading this as English I would expect this to read "it supports collection groups". Would it be reasonable to make these it('supports ...', ...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh. I think the verb here should match up with the noun in the describe() function so e.g. in this case the full test case reads "Queries support ..."

Following the example of other tests in this file I've chosen to side-step the issue with it('can query collection groups ...')

@wilhuff wilhuff assigned mikelehen and unassigned wilhuff Feb 4, 2019
Copy link
Contributor Author

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Nits resolved. I think this is good to go, pending backend durability.

@@ -886,6 +886,16 @@ declare namespace firebase.firestore {
*/
doc(documentPath: string): DocumentReference;

/**
* Gets a Query instance that will include documents from all collections and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

   * Creates and returns a new Query that includes all documents in the
   * database that are contained in a collection or subcollection with the
   * given collectionId.

* Gets a Query instance that will include documents from all collections and
* subcollections in the database with the given collectionId.
*
* @param collectionId The collectionId specifying the group of collections to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

   * @param collectionId Identifies the collections to query over. Every
   * collection or subcollection with this ID as the last segment of its path
   * will be included. Cannot contain a slash.

@@ -0,0 +1,50 @@
/**
* Copyright 2018 Google Inc.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Represents a set of indexes that are used to execute queries efficiently.
*
* Currently the only index is a [collection id] => [parent path] index, used
* to execute Collection Group queries. When we implement property indexing in
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (query.collectionGroup !== null) {
assert(
path.length % 2 === 0,
'Collection Group queries should be within a document path.'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

@@ -566,4 +567,124 @@ apiDescribe('Queries', persistence => {
}
}).to.throw(expectedError);
});

it('support collection groups', async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh. I think the verb here should match up with the noun in the describe() function so e.g. in this case the full test case reads "Queries support ..."

Following the example of other tests in this file I've chosen to side-step the issue with it('can query collection groups ...')

mikelehen pushed a commit to firebase/firebase-ios-sdk that referenced this pull request Feb 14, 2019
mikelehen pushed a commit to firebase/firebase-ios-sdk that referenced this pull request Feb 14, 2019
@mikelehen
Copy link
Contributor Author

@Feiyang1 Can you approve for the change to packages/firebase/index.d.ts? [note the change is actually commented-out for now because we can't expose the API yet]

@mikelehen mikelehen merged commit 21c0f3c into master Mar 8, 2019
@mikelehen mikelehen deleted the mikelehen/collection-group-queries branch March 8, 2019 18:37
@firebase firebase locked and limited conversation to collaborators Oct 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants