Skip to content

Initial base classes for index backfill #2950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 9, 2021
Merged

Conversation

thebrianchen
Copy link

No description provided.

@thebrianchen thebrianchen self-assigned this Sep 2, 2021
@google-cla google-cla bot added the cla: yes Override cla label Sep 2, 2021
@google-oss-bot
Copy link
Contributor

google-oss-bot commented Sep 2, 2021

Coverage Report

Affected SDKs

  • firebase-firestore

    SDK overall coverage changed from 41.43% (e41b233) to 41.48% (10c3770c) by +0.05%.

    Filename Base (e41b233) Head (10c3770c) Diff
    AsyncQueue.java 76.88% 77.00% +0.12%
    ComponentProvider.java 100.00% 97.30% -2.70%
    FirestoreClient.java 29.60% 29.01% -0.59%
    IndexBackfiller.java ? 50.00% ?
    IndexEntry.java ? 100.00% ?
    LocalStore.java 97.15% 96.81% -0.34%
    SQLitePersistence.java 83.06% 83.24% +0.18%
    Scheduler.java ? 0.00% ?

Test Logs

Notes

HTML coverage reports can be produced locally with ./gradlew <product>:checkCoverage.
Report files are located at <product-build-dir>/reports/jacoco/.

Head commit (10c3770c) is created by Prow via merging commits: e41b233 db08413.

@google-oss-bot
Copy link
Contributor

google-oss-bot commented Sep 2, 2021

Binary Size Report

Affected SDKs

  • firebase-firestore

    Type Base (e41b233) Head (10c3770c) Diff
    aar 1.18 MB 1.19 MB +5.33 kB (+0.4%)
    apk (release) 3.24 MB 3.24 MB +1.79 kB (+0.1%)

Test Logs

Notes

Head commit (10c3770c) is created by Prow via merging commits: e41b233 db08413.

Copy link
Contributor

@schmidt-sebastian schmidt-sebastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but it could be simplified by making it a bit less generic. We should also figure out how to run this optionally when the indexing flag is enabled.

@@ -143,6 +145,10 @@ public FirestoreClient(
if (gcScheduler != null) {
gcScheduler.stop();
}

if (indexScheduler != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this scheduler always run? We may also need to backfill indices that were configured after client startup.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, the scheduler should always run, but whether the scheduler schedules anything is another question. We can change this as necessary as we build more functionality out.

/** How long we wait to try running LRU GC after SDK initialization. */
private static final long INITIAL_GC_DELAY_MS = TimeUnit.MINUTES.toMillis(1);
/** Minimum amount of time between GC checks, after the first one. */
private static final long REGULAR_GC_DELAY_MS = TimeUnit.MINUTES.toMillis(5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think we should be much more aggressive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to 30 seconds and every minute. We can tune this number once we can run benchmarks.

}
}

public class Scheduler implements StartStopScheduler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to figure out what functionality a StartStopSchedule offers over a Scheduler, but I am not convinced I will ever find out :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the base class to Scheduler. To be honest, I'm not sure why the LruGarbageCollector's Scheduler used the abstract base class to begin with since it's not shared with any other class.

* Persistence layers intending to perform index backfills should implement this interface. This
* interface defines the operations that the Index Backfiller needs from the persistence layer.
*/
public interface IndexBackfillerDelegate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the indirection with the delegate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the delegate. Initially, I thought we would need to support MemoryPersistence in case the user disables persistence in their app, but I realize we can work around that.


@Override
@Nullable
public IndexEntry getIndexEntry(int indexId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this belongs here. You should only be writing. We also need to figure out how to index partial document updates, but even for those I don't think we should have a public API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take this back if you want to use this for testing - but then this should be marked as such.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this here for the base prototype, and add more features to it as we build it out.


@RunWith(RobolectricTestRunner.class)
@Config(manifest = Config.NONE)
public class MemoryIndexBackfillerTest extends IndexBackfillerTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will not test anything, will it? We can probably just have a single SQLLite backfill test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Author

@thebrianchen thebrianchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the early stage review!

/** How long we wait to try running LRU GC after SDK initialization. */
private static final long INITIAL_GC_DELAY_MS = TimeUnit.MINUTES.toMillis(1);
/** Minimum amount of time between GC checks, after the first one. */
private static final long REGULAR_GC_DELAY_MS = TimeUnit.MINUTES.toMillis(5);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to 30 seconds and every minute. We can tune this number once we can run benchmarks.

@@ -143,6 +145,10 @@ public FirestoreClient(
if (gcScheduler != null) {
gcScheduler.stop();
}

if (indexScheduler != null) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, the scheduler should always run, but whether the scheduler schedules anything is another question. We can change this as necessary as we build more functionality out.

}
}

public class Scheduler implements StartStopScheduler {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the base class to Scheduler. To be honest, I'm not sure why the LruGarbageCollector's Scheduler used the abstract base class to begin with since it's not shared with any other class.

* Persistence layers intending to perform index backfills should implement this interface. This
* interface defines the operations that the Index Backfiller needs from the persistence layer.
*/
public interface IndexBackfillerDelegate {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the delegate. Initially, I thought we would need to support MemoryPersistence in case the user disables persistence in their app, but I realize we can work around that.


@Override
@Nullable
public IndexEntry getIndexEntry(int indexId) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this here for the base prototype, and add more features to it as we build it out.


@RunWith(RobolectricTestRunner.class)
@Config(manifest = Config.NONE)
public class MemoryIndexBackfillerTest extends IndexBackfillerTestCase {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -143,6 +145,10 @@ public FirestoreClient(
if (gcScheduler != null) {
gcScheduler.stop();
}

if (indexScheduler != null && Persistence.INDEXING_SUPPORT_ENABLED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check the flag here? The scheduler would not be available if the flag is off.

The name is also a bit generic. I wonder if we should include "backfill" in the scheduler name, which reduces the mental scope from "all of indexing" to "backfill index data".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the flag and renamed to indexBackfillScheduler.

@@ -264,6 +271,10 @@ private void initialize(Context context, User user, FirebaseFirestoreSettings se
if (gcScheduler != null) {
gcScheduler.start();
}

if (indexScheduler != null && Persistence.INDEXING_SUPPORT_ENABLED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should guard the scheduler assignment based on this flag (line 265). Ideally, if indexing is off, none of the logic should even be loaded. It otherwise becomes hard to reason about the system - e.g. the constructor code of the scheduler needs to be aware that indexing might be on or off.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guarded based on the flag and persistence being enabled.


/**
* Represents an index entry saved by the SDK in the local storage. Temporary placeholder, since
* we'll probably serialize the indexValue right away rather than store it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "TODO(indexing)".

Note that we already committed FieldIndex. In #2963, FieldIndex also gets an indexId.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return new BackfillScheduler(asyncQueue, localStore);
}

// TODO: Figure out which index entries to backfill.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(indexing)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Before
public void setUp() {
persistence = PersistenceTestHelpers.createSQLitePersistence();
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tzzzz.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😨

return new Results(/* hasRun= */ true, numIndexesWritten, numIndexesRemoved);
}

public void addIndexEntry(IndexEntry entry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods would probably also be private if it was not for your tests. They should also be marked @VisibleForTesting if that is true.

All of the test-only methods should also be package-private.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

public void addAndRemoveIndexEntry() {
IndexEntry testEntry =
new IndexEntry(1, "TEST_BLOB".getBytes(), "sample-uid", "sample-documentId");
backfiller.addIndexEntry(testEntry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addIndex and related methods should be scoped to a SQLite transaction and not be run without it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@thebrianchen thebrianchen merged commit 8122d55 into master Sep 9, 2021
@thebrianchen thebrianchen deleted the bc/index-scheduler branch September 9, 2021 21:00
@firebase firebase locked and limited conversation to collaborators Oct 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes Override cla size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants