Skip to content

Add IndexFreeQueryEngine #2169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 17, 2019
Merged

Conversation

schmidt-sebastian
Copy link
Contributor

This PR ports the current state of https://github.com/firebase/firebase-android-sdk/blob/master/firebase-firestore/src/main/java/com/google/firebase/firestore/local/IndexFreeQueryEngine.java and https://github.com/firebase/firebase-android-sdk/blob/master/firebase-firestore/src/test/java/com/google/firebase/firestore/local/IndexFreeQueryEngineTest.java, as well as the updated plumbing for executeQuery() (which allows us to pass in the remoteKeys if we already have them, since we don't want to fetch them twice).

This is the last PR for Index-Free queries that adds logic to the client. There will be one more PR that adds new tests to the LocalStore.

Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I failed to find any bugs, but I did not let that stop me from complaining!

return queryView.view.computeDocChanges(docs, viewDocChanges);
});
return this.localStore
.executeQuery(queryView.query, null, documentKeySet())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like queryData / remoteKeys are optional, so you can just leave them off here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing null makes sure we don't look up any QueryData, essentially disabling Index-Free execution. I agree that this is pretty subtle here. On Android it is slightly more obvious since we have two overloads, but in general I agree that we should probably be more explicit (more on that below).

/**
* A query engine that takes advantage of the target document mapping in the
* QueryCache. The IndexFreeQueryEngine optimizes query execution by only
* reading the documents previously matched a query plus any documents that were
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"documents previously" => "documents that previously"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* edited after the query was last listened to.
*
* There are some cases where Index-Free queries are not guaranteed to produce
* to the same results as full collection scans. In these case, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"produce to the same" => "produce the same"
"these case" => "these cases" or "this case"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*
* There are some cases where Index-Free queries are not guaranteed to produce
* to the same results as full collection scans. In these case, the
* IndexFreeQueryEngine falls back to a full query processing. These cases are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to a full query processing" => "to full query processing" (or something)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Sorry :/

* - Limit queries where a document that matched the query previously no longer
* matches the query. In this case, we have to scan all local documents since a
* document that was sent to us as part of a different query result may now fall
* into the limit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a document that was sent to us as part of a different query result" should really just be "a cached document", I think... It could have been cached in a variety of ways (e.g. it used to match this query)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I used "another cached document".

const UPDATED_MATCHING_DOC_B = doc('coll/b', 11, { matches: true, order: 2 });

/**
* A LocalDocumentsView that inspects the arguments to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add "wrapper" to this description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

verifyResult(docs, [MATCHING_DOC_A, MATCHING_DOC_B]);
});

it('filters non-matching initial results', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "initial results" refers to... should this be "filters non-matching changes since initial results" by chance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Renamed.

.withLimit(1);

// Add a query mapping for a document that matches, but that sorts below
// another document based due to an update that the SDK received after the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"based due"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

version(10),
hasLimboFreeSnapshot ? version(10) : SnapshotVersion.MIN
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these helpers at the bottom instead of the top?

Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't need any state from the test and I thought this would be a good place to hide them.

I did end up removing testQueryData and am using the queryData from helpers.ts instead. Update: The test no longer needs to use QueryData since the API only requires the lastLimboFreeSnapshotVersion.

@@ -73,7 +73,7 @@ import {
} from '../../util/helpers';

import { FieldValue, IntegerValue } from '../../../src/model/field_value';
import { CountingQueryEngine } from './forwarding_counting_query_engine';
import { CountingQueryEngine } from './counting_query_engine';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😲

Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thorough review. Looks like I have to do some comment cleanup in Android.

return queryView.view.computeDocChanges(docs, viewDocChanges);
});
return this.localStore
.executeQuery(queryView.query, null, documentKeySet())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing null makes sure we don't look up any QueryData, essentially disabling Index-Free execution. I agree that this is pretty subtle here. On Android it is slightly more obvious since we have two overloads, but in general I agree that we should probably be more explicit (more on that below).


/**
* A query engine that takes advantage of the target document mapping in the
* QueryCache. The IndexFreeQueryEngine optimizes query execution by only
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of "ThreadLocalFreeQueryEngine"? It sounds even more advanced and the implementation is actually free of them. :)

In all honesty, I can't come up with a better name right now. I filed b/140938512 and will tackle this once we remove SimpleQueryEngine.

/**
* A query engine that takes advantage of the target document mapping in the
* QueryCache. The IndexFreeQueryEngine optimizes query execution by only
* reading the documents previously matched a query plus any documents that were
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* edited after the query was last listened to.
*
* There are some cases where Index-Free queries are not guaranteed to produce
* to the same results as full collection scans. In these case, the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*
* There are some cases where Index-Free queries are not guaranteed to produce
* to the same results as full collection scans. In these case, the
* IndexFreeQueryEngine falls back to a full query processing. These cases are:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Sorry :/

verifyResult(docs, [MATCHING_DOC_A, MATCHING_DOC_B]);
});

it('filters non-matching initial results', async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Renamed.

.withLimit(1);

// Add a query mapping for a document that matches, but that sorts below
// another document based due to an update that the SDK received after the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

version(10),
hasLimboFreeSnapshot ? version(10) : SnapshotVersion.MIN
);
}
Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't need any state from the test and I thought this would be a good place to hide them.

I did end up removing testQueryData and am using the queryData from helpers.ts instead. Update: The test no longer needs to use QueryData since the API only requires the lastLimboFreeSnapshotVersion.

executeQuery(
query: Query,
queryData: QueryData | null,
remoteKeys: DocumentKeySet
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me discuss the least important issue here first:

One of my goals for this API was to be relatively implementation agnostic. That's why I opted to pass in QueryData rather than just what is needed.

As for the rest of the API, there is indeed a bit too much magic here. I tried to come up with a better interaction model when I worked on the Android implementation. The other ideas that I played with at the time did not win me over (having a second refillQuery() method or explicitly changing the query data in a secondary call to remove the last limbo free snapshot version). That's how I ended up with this magic.

The "magic" provides with the following functionality:

  • A way to refill the query.
  • A way to let FirestoreClient use Index-Free, even though it does not have access to any state from LocalStore.
  • We don't need to perform two separate lookups for remoteKeys.

For now, I updated the PR to be more explicit. I now have three separate signatures:

 /**
   * Runs the specified query against the local store and returns the results,
   * potentially taking advantage of query data from previous executions (such
   * as the set of remote keys).
   */
  executeQuery(query: Query): Promise<DocumentMap>;
  /**
   * Performs a full collection scan for the provided query and returns the
   * results. Does not take into account metadata from prior executions.
   */
  executeQuery(
    query: Query,
    options: { needsRefill: true }
  ): Promise<DocumentMap>;
  /**
   * Runs the specified query against the local store and returns the results,
   * potentially taking advantage of the provided query data and the set of
   * remote document keys.
   */
  executeQuery(
    query: Query,
    options: { queryData: QueryData; remoteKeys: DocumentKeySet }
  ): Promise<DocumentMap>;

return PersistencePromise.resolve()
.next(() => {
// If QueryData is not provided (and not explicitly set to `null`), we
// retrieve the query data and the remote keys from cache.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (as shown above).

@schmidt-sebastian schmidt-sebastian removed their assignment Sep 12, 2019
Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits in addition to my main concern about the executeQuery() signature. Feel free to ping me if you want to chat through alternatives or something.

* Returns the documents for the specified remote keys if they still match the
* query, sorted by the query's comparator.
*/
getSortedPreviousResults(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the sorting is crucial, but I also think it's crucial for executeQuery...() methods to return sorted results in general, so I'm not completely convinced it's necessary to call out the sorting in the method name... especially since the filtering is also crucial but is not called out (and is in fact deemphasized by renaming query to sortByQuery when it's also being used for filtering...)

I think executeQueryAgainstDocuments() is my preference at this point. It's not a huge deal though, so if you're not won over, you can leave as-is.

previousResults.forEach(doc => {
updatedResults = updatedResults.insert(doc.key, doc);
});
return updatedResults;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I see. Probably not a big deal, but a comment would still save me some head-scratching. Or you could add an "optimization" that also removes the concern I had:

if (updatedResults.get(doc.key) === null) {
  updatedResults = updatedResults.insert(doc.key, doc);
}

I also don't care too strongly if you want to leave it as-is though.


/**
* A query engine that takes advantage of the target document mapping in the
* QueryCache. The IndexFreeQueryEngine optimizes query execution by only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The better name I am hoping for eventually is "QueryEngine" 😄

});
}

beforeEach(async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or just ignore the feedback. That's fine too. 😛

remoteKeys: DocumentKeySet
): PersistencePromise<SortedSet<Document>> {
// Fetch the documents that matched the query at the last snapshot.
return this.localDocumentsView!.getDocuments(transaction, remoteKeys).next(
previousResults => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename previousResults since the method name / args no longer refer to "previous"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it as is since the name seems significant in the context of this method.

executeQuery(
query: Query,
queryData: QueryData | null,
remoteKeys: DocumentKeySet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW (you're still free to push back / suggest alternatives) - Fleshing out my proposal a bit more, I think I'd maybe change CachedResultsInfo.lookup(targetId) to CachedResultsInfo.unknown(targetId) and add CachedResultsInfo.None, which would be the default behavior if no CachedResultsInfo was passed.

version(10),
hasLimboFreeSnapshot ? version(10) : SnapshotVersion.MIN
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care super strongly, but as a general rule I hate test boilerplate and so I preferred having the specialized helper. Was there a reason to remove it?

Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: Helper no longer needed.

});
}

beforeEach(async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below: "They don't need any state from the test and I thought this would be a good place to hide them." :)

previousResults.forEach(doc => {
updatedResults = updatedResults.insert(doc.key, doc);
});
return updatedResults;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment, but left the code as is.

* Returns the documents for the specified remote keys if they still match the
* query, sorted by the query's comparator.
*/
getSortedPreviousResults(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main executeQuery() does not return sorted results, but rather returns a DocumentMap keyed by document name. Because I spent a ton of time thinking about executeQuery() recently, to me the sorting aspect is not as obvious.

Either way, I think I managed to address both of our concerns. I removed the offending method and am not fetching the results in getDocumentsMatchingQuery(). The sorting and filtering now takes place in applyQuery.

remoteKeys: DocumentKeySet
): PersistencePromise<SortedSet<Document>> {
// Fetch the documents that matched the query at the last snapshot.
return this.localDocumentsView!.getDocuments(transaction, remoteKeys).next(
previousResults => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it as is since the name seems significant in the context of this method.

version(10),
hasLimboFreeSnapshot ? version(10) : SnapshotVersion.MIN
);
}
Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian Sep 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: Helper no longer needed.

).snapshot!;
});
return this.asyncQueue.enqueue(async () => {
const queryResult = await this.localStore.executeQuery(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to all run on the async queue.

query,
/* usePreviousResults= */ true
);
const view = new View(query, queryResult.remoteKeys);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queryResult.remoteKeys is used now instead of null, which seems more correct, but doesn't change the state of the fromCache flag since it is set to true either way.

executeQuery(
query: Query,
queryData: QueryData | null,
remoteKeys: DocumentKeySet
Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian Sep 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this based on our offline discussion:

  • executeQuery() now returns the remoteKeys, which means that there are now only two cases this method needs to address (Index-Free: yes or no). We don't need to pass remote keys since the method returns them. As discussed, this adds slight overhead to limit query refills.

  • This change also solves a potential multi-tab race since the set of remote keys and the set of remote documents are now fetched in a single transaction.

  • I also changed the QueryEngine to only take the lastLimbFreeSnapshotVersion instead of QueryData.

});
return this.localStore
.executeQuery(queryView.query, /* usePreviousResults= */ false)
.then(({ documents }) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I verified that the transpiler unpacks this (in cjs and esm mode).

Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor request...

const cachedQueryData = this.queryDataByTarget[queryData!.targetId];
return cachedQueryData || queryData;
});
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... We are trading a "slow" O(1) operation [or probably O(log(n) I guess] for a "fast" O(n) operation. I know this is only an in-memory O(n) scan and n probably won't be huge most of the time... but it means adding n listeners will do n^2 operations and performance will degrade as you add each additional listener, which is not how the SDK is meant to scale (doing 1000 listeners should be ~1000 times more work than 1 listener, not ~1,000,000 times).

I don't know for sure how big n needs to be to matter, but in general I would prefer to just use the right data structures so that we get the computational complexity that we expect. So I think the right thing to do would be to add a "reverse" queryDataByQuery (or targetIdByQuery) ObjectMap so that we can do a fast lookup.

WDYT? At minimum, I think this would deserve a TODO(perf) comment to flag that we're doing an unnecessary O(n) operation here that could be optimized away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this as is and will send you a follow-up PR soonish. My gut feeling is that n has to be fairly large to be slower than an IndexedDb lookup. This is mostly based on how much faster our Memory persistence layer is in compared to our IndexedDb-backed implementation.

return null;
let queryData: QueryData | null = null;

// Look up the query data in our local map first, as this avoid a potential
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid=>avoids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@schmidt-sebastian schmidt-sebastian merged commit 539ab0d into mrschmidt/indexfree-master Sep 17, 2019
@schmidt-sebastian schmidt-sebastian deleted the mrschmidt/engine branch October 10, 2019 22:55
@firebase firebase locked and limited conversation to collaborators Oct 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants