Collection Group Queries w/ indexing #1440

mikelehen · 2018-12-21T16:37:26Z

@wilhuff If you have time can you do an initial sanity-check review of this before the holiday? I was hoping to have it 100% ready, but have been too randomized. The feature works, but it's missing the schema migration to populate the index and I need to add more tests. In particular, this PR contains:

API surface area for Collection Group queries.
New QueryIndexes component to maintain a collectionParents index for CG queries.
The plumbing changes to serializer, Query execution, LocalDocumentsView, etc. to make CG queries work with the new index.

wilhuff

On the whole I think this is great.

One thing that's missing is a recording of the collection-group-parent entries for pending mutations. We need to index those too otherwise these queries won't work while offline. We probably need some tests that verify that they do work while offline.

packages/firestore/src/api/database.ts

packages/firestore/src/core/query.ts

wilhuff · 2018-12-21T17:13:16Z

packages/firestore/src/core/query.ts

@@ -176,12 +174,31 @@ export class Query {
    );
  }

+  /**
+   * Helper to convert a Collection Group query into a collection query at a
+   * specific path.


This doesn't seem like an intrinsically useful operation. Maybe add a comment about what it's useful for?

Good call, done.

wilhuff · 2018-12-21T17:18:47Z

packages/firestore/src/local/query_indexes.ts

+   * path for root-level collections). Index entries can be retrieved via
+   * getCollectionParents().
+   */
+  indexCollectionParent(


This name is slightly confusing because we're using "index" as both a noun and a verb in this context. I initially thought this might be a getter but was confused by the return type. Some other verb might help with that.

Maybe ensureCollectionParentIndexed? addToCollectionParentIndex?

However, see above. Maybe this doesn't need to be so specific.

Good point. How about addCollectionParentEntry(), trying to be consistent with our RemoteDocumentCache.addEntry() naming? I don't care strongly though.

packages/firestore/src/core/query.ts

wilhuff · 2018-12-21T18:05:51Z

packages/firestore/src/core/query.ts

-    if (DocumentKey.isDocumentKey(this.path)) {
+    if (this.collectionGroup !== null) {
+      return (
+        this.collectionGroup === docPath.secondToLastSegment() &&


"secondToLastSegment" is a wonky thing to have on here.

Maybe add a hasCollectionId(this.collectinoGroup) method on documentPath?

Done. I added the method to DocumentKey, since it wouldn't always apply to ResourcePath.

wilhuff · 2018-12-21T18:21:14Z

packages/firestore/src/local/indexeddb_query_indexes.ts

+    collectionPath: ResourcePath
+  ): PersistencePromise<void> {
+    assert(collectionPath.length >= 1, 'Invalid collection path.');
+    const collectionId = collectionPath.lastSegment();


This seems to essentially duplicate the MemoryQueryIndexes implementation. Couldn't we just reuse that here?

Maaaaybe?

It is very similar code, but I can't think of a clean way to reuse it that wouldn't involve creating an awkward abstraction just for the sake of code reuse, and I hate doing that. It's tempting to just use MemoryQueryIndexes directly, but I worry that'll be really awkward / confusing in the future when we have more indexes...

It's worth noting that here (unlike MemoryQueryIndexes) we only use the in-memory cache of indexes in the write path (to avoid re-writing indexes we know exist). We don't use it in the read path since we never populate it with existing index entries from disk. So although the data structure is the same, we use it differently.

Anyway, if you have a suggestion let me know, else I'm inclined to just live with the duplication for now.

I ended up reworking this, extracting MemoryCollectionParentIndex as a standalone class used by MemoryIndexManager so that I could reuse it in both the IndexedDbIndexManager as well as in the schema migration that back-populates indexes.

wilhuff · 2018-12-21T18:22:49Z

packages/firestore/src/local/indexeddb_query_indexes.ts

+    return collectionParentsStore(transaction)
+      .loadAll(range)
+      .next(entries => {
+        for (const { parent } of entries) {


What's the name of this construct so I can read more about it? (Not asking for any change--just want to understand this better.)

Object destructuring. https://basarat.gitbooks.io/typescript/docs/destructuring.html

wilhuff · 2018-12-21T19:22:43Z

One other thought: could it be useful to have a possibly disabled by default test that exercises collection groups in combination with filters/order by?

mikelehen

Thanks for the quick review! I've fixed the small, easy things, and commented on the rest of the feedback. I'll ping you in an hour or so to two to see if we can chat through the unresolved high-level feedback, in particular the relationship of QueryIndexes to the rest of the components.

mikelehen · 2018-12-21T18:44:54Z

packages/firestore/src/local/indexeddb_query_indexes.ts

+    return collectionParentsStore(transaction)
+      .loadAll(range)
+      .next(entries => {
+        for (const { parent } of entries) {


Object destructuring. https://basarat.gitbooks.io/typescript/docs/destructuring.html

packages/firestore/src/api/database.ts

packages/firestore/src/core/query.ts

mikelehen · 2018-12-21T19:01:40Z

packages/firestore/src/core/query.ts

@@ -176,12 +174,31 @@ export class Query {
    );
  }

+  /**
+   * Helper to convert a Collection Group query into a collection query at a
+   * specific path.


Good call, done.

mikelehen · 2018-12-21T19:19:00Z

packages/firestore/src/local/indexeddb_query_indexes.ts

+    collectionPath: ResourcePath
+  ): PersistencePromise<void> {
+    assert(collectionPath.length >= 1, 'Invalid collection path.');
+    const collectionId = collectionPath.lastSegment();


Maaaaybe?

It is very similar code, but I can't think of a clean way to reuse it that wouldn't involve creating an awkward abstraction just for the sake of code reuse, and I hate doing that. It's tempting to just use MemoryQueryIndexes directly, but I worry that'll be really awkward / confusing in the future when we have more indexes...

It's worth noting that here (unlike MemoryQueryIndexes) we only use the in-memory cache of indexes in the write path (to avoid re-writing indexes we know exist). We don't use it in the read path since we never populate it with existing index entries from disk. So although the data structure is the same, we use it differently.

Anyway, if you have a suggestion let me know, else I'm inclined to just live with the duplication for now.

mikelehen · 2018-12-21T19:31:10Z

packages/firestore/src/local/memory_remote_document_cache.ts

@@ -50,7 +51,10 @@ export class MemoryRemoteDocumentCache implements RemoteDocumentCache {
   * @param sizer Used to assess the size of a document. For eager GC, this is expected to just
   * return 0 to avoid unnecessarily doing the work of calculating the size.
   */
-  constructor(private readonly sizer: DocumentSizer) {}
+  constructor(
+    private readonly queryIndexes: QueryIndexes,


FWIW the fullness of time is very fuzzy to me. I was imagining that we'd have a UnifiedDocumentCache component that would manage mutations and remote documents, and so it could own QueryIndexes and initiate the appropriate index updates (based on old/new values, etc.). So until then, my plan was to just have RemoteDocumentCache and MutationQueue separately do the appropriate index updates (though I forgot the MutationQueue, oops).

But I think this is probably worth talking through in chat or in-person because I don't think I'm fully grokking your suggestion.

mikelehen · 2018-12-21T19:40:36Z

packages/firestore/src/local/query_indexes.ts

+import { PersistencePromise } from './persistence_promise';
+
+/**
+ * Represents a set of indexes that are used to execute queries efficiently.


I added some comment text. I'm fuzzy on what this interface is going to look like over time though. I imagine we may end up removing indexCollectionParent() and instead have indexDocument(oldDoc, newDoc) or something (and it would implicitly index the collection parent as well).

mikelehen · 2018-12-21T19:44:04Z

packages/firestore/src/local/query_indexes.ts

+/**
+ * Represents a set of indexes that are used to execute queries efficiently.
+ */
+export interface QueryIndexes {


I'm not in love with it either. Your alternatives are the same ones I considered. :-)

I wasn't sure if Indexer was okay, since it also reads indexes (getCollectionParents()) and the rest of our persistence classes are nouns (QueryCache, RemoteDocumentCache, MutationQueue). So I think I'd slightly prefer IndexManager... but I'm kinda' allergic to managers...

I'm opposed to CollectionParentIndex because the file overhead is a pretty big pain and I'm not sure what the future of this all looks like (e.g. see comment about indexDocument(oldDoc, newDoc) above).

Anyway, I'll plan to rename this to IndexManager before final review unless you have a preference for something else.

mikelehen · 2018-12-21T19:57:38Z

packages/firestore/src/local/query_indexes.ts

+   * path for root-level collections). Index entries can be retrieved via
+   * getCollectionParents().
+   */
+  indexCollectionParent(


Good point. How about addCollectionParentEntry(), trying to be consistent with our RemoteDocumentCache.addEntry() naming? I don't care strongly though.

wilhuff · 2018-12-22T00:02:12Z

Just to quickly summarize our discussion before we go away for a while:

Let's leave QueryIndexes as a member of the remote document caches
We're doing so ensure that all remote document update paths are caught
Rename QueryIndexes to something else
Add an index for elements of the mutation queue
Add test coverage for mutations
Add offline tests that validate CG queries with other criteria execute correctly locally

* QueryIndexes => IndexManager * indexCollectionParent() => addToCollectionParentIndex()

mikelehen · 2019-02-04T17:07:55Z

@wilhuff I think this is ready for review. I did the renames we talked about, added tests, etc.

wilhuff

LGTM. A few more nits and we're good to go!

wilhuff · 2019-02-04T21:25:40Z

packages/firebase/index.d.ts

@@ -886,6 +886,16 @@ declare namespace firebase.firestore {
     */
    doc(documentPath: string): DocumentReference;

+    /**
+     * Gets a Query instance that will include documents from all collections and


"Gets" is the wrong verb here. Elsewhere when talking about creating a new Query object we've written "Creates and returns a new Query".

Also, we've written "@return The created Query" instead of "The Query instance." below

Changed to:

* Creates and returns a new Query that includes all documents in the * database that are contained in a collection or subcollection with the * given collectionId.

wilhuff · 2019-02-04T22:42:04Z

packages/firebase/index.d.ts

+     * Gets a Query instance that will include documents from all collections and
+     * subcollections in the database with the given collectionId.
+     *
+     * @param collectionId The collectionId specifying the group of collections to


"collectionId" is a term of art that's pretty specific to Firestore. Making this comment self-referential like this makes it more opaque than it needs to be. Could we include text here that indicates that a collectionId is the trailing component of a path to a collection? Also maybe note that collectionIds don't contain slashes?

Changed to:

* @param collectionId Identifies the collections to query over. Every * collection or subcollection with this ID as the last segment of its path * will be included. Cannot contain a slash.

wilhuff · 2019-02-04T22:44:06Z

packages/firestore/src/local/index_manager.ts

@@ -0,0 +1,50 @@
+/**
+ * Copyright 2018 Google Inc.


wilhuff · 2019-02-04T22:47:46Z

packages/firestore/src/local/index_manager.ts

+ * Represents a set of indexes that are used to execute queries efficiently.
+ *
+ * Currently the only index is a [collection id] => [parent path] index, used
+ * to execute Collection Group queries. When we implement property indexing in


All code is subject to change as time progresses. I realize you're not wild about the fog of war surrounding our eventual goals with indexing seems as if this last sentence isn't adding much value.

wilhuff · 2019-02-04T22:58:28Z

packages/firestore/src/remote/serializer.ts

+    if (query.collectionGroup !== null) {
+      assert(
+        path.length % 2 === 0,
+        'Collection Group queries should be within a document path.'


Nit: Does the root of the database count as a document path? Maybe "within a document path or root"?

Sure, done.

wilhuff · 2019-02-04T23:00:52Z

packages/firestore/test/integration/api/query.test.ts

@@ -566,4 +567,124 @@ apiDescribe('Queries', persistence => {
      }
    }).to.throw(expectedError);
  });
+
+  it('support collection groups', async () => {


When reading this as English I would expect this to read "it supports collection groups". Would it be reasonable to make these it('supports ...', ...)?

Eh. I think the verb here should match up with the noun in the describe() function so e.g. in this case the full test case reads "Queries support ..."

Following the example of other tests in this file I've chosen to side-step the issue with it('can query collection groups ...')

mikelehen

Thanks! Nits resolved. I think this is good to go, pending backend durability.

mikelehen · 2019-02-04T23:31:30Z

packages/firebase/index.d.ts

@@ -886,6 +886,16 @@ declare namespace firebase.firestore {
     */
    doc(documentPath: string): DocumentReference;

+    /**
+     * Gets a Query instance that will include documents from all collections and


Changed to:

* Creates and returns a new Query that includes all documents in the * database that are contained in a collection or subcollection with the * given collectionId.

mikelehen · 2019-02-04T23:38:58Z

packages/firebase/index.d.ts

+     * Gets a Query instance that will include documents from all collections and
+     * subcollections in the database with the given collectionId.
+     *
+     * @param collectionId The collectionId specifying the group of collections to


Changed to:

* @param collectionId Identifies the collections to query over. Every * collection or subcollection with this ID as the last segment of its path * will be included. Cannot contain a slash.

mikelehen · 2019-02-04T23:44:24Z

packages/firestore/src/local/index_manager.ts

@@ -0,0 +1,50 @@
+/**
+ * Copyright 2018 Google Inc.


mikelehen · 2019-02-04T23:45:26Z

packages/firestore/src/local/index_manager.ts

+ * Represents a set of indexes that are used to execute queries efficiently.
+ *
+ * Currently the only index is a [collection id] => [parent path] index, used
+ * to execute Collection Group queries. When we implement property indexing in


mikelehen · 2019-02-04T23:47:32Z

packages/firestore/src/remote/serializer.ts

+    if (query.collectionGroup !== null) {
+      assert(
+        path.length % 2 === 0,
+        'Collection Group queries should be within a document path.'


Sure, done.

mikelehen · 2019-02-04T23:50:37Z

packages/firestore/test/integration/api/query.test.ts

@@ -566,4 +567,124 @@ apiDescribe('Queries', persistence => {
      }
    }).to.throw(expectedError);
  });
+
+  it('support collection groups', async () => {


Eh. I think the verb here should match up with the noun in the describe() function so e.g. in this case the full test case reads "Queries support ..."

Following the example of other tests in this file I've chosen to side-step the issue with it('can query collection groups ...')

Port of firebase/firebase-js-sdk#1440.

mikelehen · 2019-03-08T16:23:53Z

@Feiyang1 Can you approve for the change to packages/firebase/index.d.ts? [note the change is actually commented-out for now because we can't expose the API yet]

Michael Lehenbauer added 2 commits December 21, 2018 08:27

Collection Group Queries w/ indexing

0db1e40

[AUTOMATED]: Prettier Code Styling

f516eec

mikelehen added the api: firestore label Dec 21, 2018

mikelehen assigned wilhuff Dec 21, 2018

mikelehen requested review from bojeil-google, depoll, Feiyang1, gsoltis, hiranya911, rsgowman, schmidt-sebastian, var-const, wilhuff and zxu123 as code owners December 21, 2018 16:37

google-oss-bot added the needs-triage label Dec 21, 2018

wilhuff reviewed Dec 21, 2018

View reviewed changes

mikelehen added feature-request and removed needs-triage labels Dec 21, 2018

Initial review feedback.

75f4d53

mikelehen commented Dec 21, 2018

View reviewed changes

wilhuff assigned mikelehen and unassigned wilhuff Dec 22, 2018

Michael Lehenbauer added 6 commits January 23, 2019 17:29

CR Feedback: QueryIndexes renames.

37cf4c2

* QueryIndexes => IndexManager * indexCollectionParent() => addToCollectionParentIndex()

Create index entries from MutationQueue.addMutationBatch().

7160856

Add spec tests.

22af7fa

Merge branch 'master' into mikelehen/collection-group-queries

6d4b904

Index existing data.

57fa71d

[AUTOMATED]: Prettier Code Styling

c65b539

mikelehen changed the title ~~Collection Group Queries w/ indexing [WIP -- not ready for submission]~~ Collection Group Queries w/ indexing Feb 4, 2019

mikelehen assigned wilhuff and unassigned mikelehen Feb 4, 2019

Michael Lehenbauer added 5 commits February 4, 2019 09:10

Delete accidentally checked-in vim swap file.

14b47d6

Update changelog.

30298ba

Tweak test to use path().

33f10f2

Add asserts to verify collection paths.

22567a9

Simplify schema migration test.

412eb1a

wilhuff approved these changes Feb 4, 2019

View reviewed changes

wilhuff assigned mikelehen and unassigned wilhuff Feb 4, 2019

CR feedback.

62d6024

mikelehen commented Feb 5, 2019

View reviewed changes

Tweak comment.

e37728a

mikelehen mentioned this pull request Feb 5, 2019

Collection Group Queries firebase/firebase-android-sdk#233

Merged

CR Feedback.

493ea7b

mikelehen pushed a commit to firebase/firebase-ios-sdk that referenced this pull request Feb 14, 2019

Collection Group queries.

2fc6e4c

Port of firebase/firebase-js-sdk#1440.

mikelehen pushed a commit to firebase/firebase-ios-sdk that referenced this pull request Feb 14, 2019

Collection Group queries.

419394f

Port of firebase/firebase-js-sdk#1440.

mikelehen mentioned this pull request Feb 14, 2019

Collection Group Queries port to iOS firebase/firebase-ios-sdk#2378

Merged

Michael Lehenbauer added 3 commits February 21, 2019 10:24

Port minor CR feedback back to JS.

20c2e9e

Merge branch 'master' into mikelehen/collection-group-queries

60a5c48

Hide public API for CG queries until backend support is ready.

0a89e03

mikelehen assigned Feiyang1 Mar 8, 2019

Feiyang1 approved these changes Mar 8, 2019

View reviewed changes

mikelehen merged commit 21c0f3c into master Mar 8, 2019

mikelehen deleted the mikelehen/collection-group-queries branch March 8, 2019 18:37

schmidt-sebastian mentioned this pull request Mar 11, 2019

feature: Adding CollectionGroup queries googleapis/nodejs-firestore#578

Merged

firebase locked and limited conversation to collaborators Oct 14, 2019

Collection Group Queries w/ indexing #1440

Collection Group Queries w/ indexing #1440

Uh oh!

Conversation

mikelehen commented Dec 21, 2018

Uh oh!

wilhuff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wilhuff commented Dec 21, 2018

Uh oh!

mikelehen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wilhuff commented Dec 22, 2018

Uh oh!

mikelehen commented Feb 4, 2019

Uh oh!

wilhuff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment