Skip to content

Concurrency page and more accurate tracking #1252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Aug 13, 2024
Merged

Conversation

matt-aitken
Copy link
Member

@matt-aitken matt-aitken commented Aug 11, 2024

In the current UI the Tasks page has a "Running" column. You would think this is the same as the amount of concurrency you have but it's not. It's the number of tasks that have actually started executing code which is different from the number that have been dequeued (and are just about to start executing). This difference can be extreme in the case of runs that do a very small amount of compute time.

This PR adds accurate tracking of actual concurrency. Any run that has been dequeued and not acked or nacked counts towards your concurrency. We track this using Redis by:

  • Task
  • Environment
  • Globally for dev
  • Globally for deployed

Additionally a new Concurrency page has been added that surfaces this data:

CleanShot 2024-08-12 at 12 44 21@2x

Copy link

changeset-bot bot commented Aug 11, 2024

⚠️ No Changeset found

Latest commit: 34e4b29

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@nicktrn nicktrn merged commit da6ce3c into main Aug 13, 2024
2 checks passed
@nicktrn nicktrn deleted the improved-concurrency-tracking branch August 13, 2024 10:43
matt-aitken added a commit that referenced this pull request Aug 15, 2024
* Initial TaskRunConcurrencyTracker implementation

* MARQS calls a subscriber to events

* When enqueuing add the extra required metadata

* Track concurrency per environment for tasks too

* Admin page for global concurrency

* Use the new concurrency tracker on the tasks page

* Useful performance test task

* getAllTaskIdentifiers()

* New page for concurrency

* BackgroundWorkerTask index for quick lookup of task identifiers

* Added a way to get concurrency for environments

* Added upgrade/request more concurrency button

* Queued task column working

* Use defer and suspense

* Added queue column to the concurrency environments table

* Some comments added for clarity

* Fixed bad log message

* Sidemenu: move lower and rename to “Concurrency limits”

* Only show the environments, not tasks. Renamed to “Concurrency limits”
matt-aitken added a commit that referenced this pull request Sep 4, 2024
* WIP on using react-window-splitter

* WIP with new resizable panels and SSR

* Use the cookie package

* Resizable storybook page

* Increase indexing memory limit

* Fixed v2 usage meter displaying when on paid plan (#1255)

* Fixed v2 usage meter displaying when on paid plan

* Show the free usage panel only for v3 projects

* Concurrency page and more accurate tracking (#1252)

* Initial TaskRunConcurrencyTracker implementation

* MARQS calls a subscriber to events

* When enqueuing add the extra required metadata

* Track concurrency per environment for tasks too

* Admin page for global concurrency

* Use the new concurrency tracker on the tasks page

* Useful performance test task

* getAllTaskIdentifiers()

* New page for concurrency

* BackgroundWorkerTask index for quick lookup of task identifiers

* Added a way to get concurrency for environments

* Added upgrade/request more concurrency button

* Queued task column working

* Use defer and suspense

* Added queue column to the concurrency environments table

* Some comments added for clarity

* Fixed bad log message

* Sidemenu: move lower and rename to “Concurrency limits”

* Only show the environments, not tasks. Renamed to “Concurrency limits”

* v3: fix unfreezable state crashes for runs with multiple waits (#1253)

* support named capture groups

* write crash errors to attempt.error

* make restored pod names unique per checkpoint

* use last eight characters of checkpoint id instead

* add more chaos monkey env vars

* Ignore unfreezable states

* prevent excessive queue config parsing errors

* handle dependency resume edge case

* better entry point logging

* ignore checkpoint cancellation timeouts

* add missing idempotency keys to wait for dep replays

* remove checkpoints between attempts

* fix retry container names on kubernetes

* add changeset

* fix types

* bring back internal duration timers

* Added more logging to TaskRunConcurrencyTracker and some more try/catches

* Call subscriber.messageDequeued in dequeueMessageInSharedQueue

* Added messageReplaced to concurrency tracking (when freezing)

* Added depenenciesToBundle guide to bundle all packages

* Include the old message data when replacing, so we get the projectId etc.

* Fix restored container names

* Fix for schedule page not scrolling

* Added a description panel to the Concurrency admin page

* chore: Update version for release (beta) (#1256)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Release 3.0.0-beta.53

* Added a note to use batchTrigger() instead of trigger()

* The latest react-window-splitter fixes the ESM issues

* Set sensible defaults for the run page

* Deployments page

* Test page

* Schedules page

* Latest version of react-window-splitter (0.2.5)

* Updated to the latest version: react-window-splitter

* Callout if runs don’t start right away now has some top margin

* Small padding fix

* styled the handle focus state

* Added isStaticAtRest prop to resizable panel

* Updated resizable storybook

* Inline code blocks behave nicer when text wraps

* Added ElectricSQL to docker-compose, available on 3060

* Extracted some logic out of the eventRepository for getting a trace. This will be used on the frontend

* Use the new util

* More restructuring ready to use the trace summary from the frontend

* Using ElectricSQL for the run page data

* Min size for resizable panel on test page

* Don’t load the trace in the RunPresenter anymore

* Fix for the resizable panels on the run page

* Added overflow hidden to the panel group

* min size for the test page left hand panel

* Updated to latest window-splitter version

* Removed unused const

* One fix for client-server mismatch

* Slight improvement in the loading state

* Restructured the page so the loading is better

* Improvement to the loading states

* Improved the loading behaviour with the inspector

* WIP on auth, having problems with it

* Upgrade Remix to 2.9.1 (same as PR #1096)

* Switched structure around again so we only call the useTrace hook from the client

* Added auth to the sync

* Overscan more rows in the tree view

* Fix for TS error

* Remove duplicate import

* Revert "Upgrade Remix to 2.9.1 (same as PR #1096)"

This reverts commit e63ee9e.

* save cookie only when id is used

* Deployment table now scrolls

* removed imports

* A lot of changes to make the inspector live too… WIP

* More major overhauls to get the synced version of the run page working…

* If a span is completed show that

* Set the debounce much lower for selecting the span view

* Load the details run inspector data on demand

* Delete the SpanPresenter

* Use the async payload because it deals with superjson

* Fixed weird merge conflict

* Share some inspector timeline components

* A couple of layout tweaks

* Improved the run inspector loading states

* Fix for paragaph errors

* Fix for focusing on a span

* Undefined typre for useSyncedShape

* ELECTRIC_ORIGIN env var doesn’t have a default, added to the examples

* Updated @electric-sql/react package to the latest

* Fix the timeline duration stretching

* Added some better error handling for the electric sync

* More logging

* Better error when there are bad responses

* Turn off resizable snapshots, there’s a bug

---------

Co-authored-by: nicktrn <[email protected]>
Co-authored-by: James Ritchie <[email protected]>
Co-authored-by: James Ritchie <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
matt-aitken added a commit that referenced this pull request Sep 13, 2024
* WIP on using react-window-splitter

* WIP with new resizable panels and SSR

* Use the cookie package

* Resizable storybook page

* Increase indexing memory limit

* Fixed v2 usage meter displaying when on paid plan (#1255)

* Fixed v2 usage meter displaying when on paid plan

* Show the free usage panel only for v3 projects

* Concurrency page and more accurate tracking (#1252)

* Initial TaskRunConcurrencyTracker implementation

* MARQS calls a subscriber to events

* When enqueuing add the extra required metadata

* Track concurrency per environment for tasks too

* Admin page for global concurrency

* Use the new concurrency tracker on the tasks page

* Useful performance test task

* getAllTaskIdentifiers()

* New page for concurrency

* BackgroundWorkerTask index for quick lookup of task identifiers

* Added a way to get concurrency for environments

* Added upgrade/request more concurrency button

* Queued task column working

* Use defer and suspense

* Added queue column to the concurrency environments table

* Some comments added for clarity

* Fixed bad log message

* Sidemenu: move lower and rename to “Concurrency limits”

* Only show the environments, not tasks. Renamed to “Concurrency limits”

* v3: fix unfreezable state crashes for runs with multiple waits (#1253)

* support named capture groups

* write crash errors to attempt.error

* make restored pod names unique per checkpoint

* use last eight characters of checkpoint id instead

* add more chaos monkey env vars

* Ignore unfreezable states

* prevent excessive queue config parsing errors

* handle dependency resume edge case

* better entry point logging

* ignore checkpoint cancellation timeouts

* add missing idempotency keys to wait for dep replays

* remove checkpoints between attempts

* fix retry container names on kubernetes

* add changeset

* fix types

* bring back internal duration timers

* Added more logging to TaskRunConcurrencyTracker and some more try/catches

* Call subscriber.messageDequeued in dequeueMessageInSharedQueue

* Added messageReplaced to concurrency tracking (when freezing)

* Added depenenciesToBundle guide to bundle all packages

* Include the old message data when replacing, so we get the projectId etc.

* Fix restored container names

* Fix for schedule page not scrolling

* Added a description panel to the Concurrency admin page

* chore: Update version for release (beta) (#1256)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Release 3.0.0-beta.53

* Added a note to use batchTrigger() instead of trigger()

* The latest react-window-splitter fixes the ESM issues

* Set sensible defaults for the run page

* Deployments page

* Test page

* Schedules page

* Latest version of react-window-splitter (0.2.5)

* Updated to the latest version: react-window-splitter

* Callout if runs don’t start right away now has some top margin

* Small padding fix

* styled the handle focus state

* Added isStaticAtRest prop to resizable panel

* Updated resizable storybook

* Inline code blocks behave nicer when text wraps

* Added ElectricSQL to docker-compose, available on 3060

* Extracted some logic out of the eventRepository for getting a trace. This will be used on the frontend

* Use the new util

* More restructuring ready to use the trace summary from the frontend

* Using ElectricSQL for the run page data

* Min size for resizable panel on test page

* Don’t load the trace in the RunPresenter anymore

* Fix for the resizable panels on the run page

* Added overflow hidden to the panel group

* min size for the test page left hand panel

* Updated to latest window-splitter version

* Removed unused const

* One fix for client-server mismatch

* Slight improvement in the loading state

* Restructured the page so the loading is better

* Improvement to the loading states

* Improved the loading behaviour with the inspector

* WIP on auth, having problems with it

* Upgrade Remix to 2.9.1 (same as PR #1096)

* Switched structure around again so we only call the useTrace hook from the client

* Added auth to the sync

* Overscan more rows in the tree view

* Fix for TS error

* Remove duplicate import

* Revert "Upgrade Remix to 2.9.1 (same as PR #1096)"

This reverts commit e63ee9e.

* save cookie only when id is used

* Deployment table now scrolls

* removed imports

* A lot of changes to make the inspector live too… WIP

* More major overhauls to get the synced version of the run page working…

* If a span is completed show that

* Set the debounce much lower for selecting the span view

* Load the details run inspector data on demand

* Delete the SpanPresenter

* Use the async payload because it deals with superjson

* Fixed weird merge conflict

* Share some inspector timeline components

* A couple of layout tweaks

* Improved the run inspector loading states

* Fix for paragaph errors

* Fix for focusing on a span

* Undefined typre for useSyncedShape

* ELECTRIC_ORIGIN env var doesn’t have a default, added to the examples

* Updated @electric-sql/react package to the latest

* Fix the timeline duration stretching

* Added some better error handling for the electric sync

* More logging

* Better error when there are bad responses

* Turn off resizable snapshots, there’s a bug

* Added getSpan back

* Added SpanPresenter back

* Updated to the new Electric hooks package

* Made a copy so we have the old run page and the new electric one

* Put the main eventRepository back for now

---------

Co-authored-by: nicktrn <[email protected]>
Co-authored-by: James Ritchie <[email protected]>
Co-authored-by: James Ritchie <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
D-K-P added a commit that referenced this pull request Sep 13, 2024
commit c1690bd
Author: James Ritchie <[email protected]>
Date:   Fri Sep 13 16:46:52 2024 +0100

    New React to PDF example task (#1300)

    * Better code snippet to clarify your framework

    * New page for react-pdf

    * Added 3 more examples

    * New react to PDF example

    * Alphabeticalise the side menu

    * Removed github link and tweaked title

commit 9a7ad92
Author: Matt Aitken <[email protected]>
Date:   Fri Sep 13 11:45:24 2024 +0100

    ElectricSQL run page (hidden page for now) (#1297)

    * WIP on using react-window-splitter

    * WIP with new resizable panels and SSR

    * Use the cookie package

    * Resizable storybook page

    * Increase indexing memory limit

    * Fixed v2 usage meter displaying when on paid plan (#1255)

    * Fixed v2 usage meter displaying when on paid plan

    * Show the free usage panel only for v3 projects

    * Concurrency page and more accurate tracking (#1252)

    * Initial TaskRunConcurrencyTracker implementation

    * MARQS calls a subscriber to events

    * When enqueuing add the extra required metadata

    * Track concurrency per environment for tasks too

    * Admin page for global concurrency

    * Use the new concurrency tracker on the tasks page

    * Useful performance test task

    * getAllTaskIdentifiers()

    * New page for concurrency

    * BackgroundWorkerTask index for quick lookup of task identifiers

    * Added a way to get concurrency for environments

    * Added upgrade/request more concurrency button

    * Queued task column working

    * Use defer and suspense

    * Added queue column to the concurrency environments table

    * Some comments added for clarity

    * Fixed bad log message

    * Sidemenu: move lower and rename to “Concurrency limits”

    * Only show the environments, not tasks. Renamed to “Concurrency limits”

    * v3: fix unfreezable state crashes for runs with multiple waits (#1253)

    * support named capture groups

    * write crash errors to attempt.error

    * make restored pod names unique per checkpoint

    * use last eight characters of checkpoint id instead

    * add more chaos monkey env vars

    * Ignore unfreezable states

    * prevent excessive queue config parsing errors

    * handle dependency resume edge case

    * better entry point logging

    * ignore checkpoint cancellation timeouts

    * add missing idempotency keys to wait for dep replays

    * remove checkpoints between attempts

    * fix retry container names on kubernetes

    * add changeset

    * fix types

    * bring back internal duration timers

    * Added more logging to TaskRunConcurrencyTracker and some more try/catches

    * Call subscriber.messageDequeued in dequeueMessageInSharedQueue

    * Added messageReplaced to concurrency tracking (when freezing)

    * Added depenenciesToBundle guide to bundle all packages

    * Include the old message data when replacing, so we get the projectId etc.

    * Fix restored container names

    * Fix for schedule page not scrolling

    * Added a description panel to the Concurrency admin page

    * chore: Update version for release (beta) (#1256)

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

    * Release 3.0.0-beta.53

    * Added a note to use batchTrigger() instead of trigger()

    * The latest react-window-splitter fixes the ESM issues

    * Set sensible defaults for the run page

    * Deployments page

    * Test page

    * Schedules page

    * Latest version of react-window-splitter (0.2.5)

    * Updated to the latest version: react-window-splitter

    * Callout if runs don’t start right away now has some top margin

    * Small padding fix

    * styled the handle focus state

    * Added isStaticAtRest prop to resizable panel

    * Updated resizable storybook

    * Inline code blocks behave nicer when text wraps

    * Added ElectricSQL to docker-compose, available on 3060

    * Extracted some logic out of the eventRepository for getting a trace. This will be used on the frontend

    * Use the new util

    * More restructuring ready to use the trace summary from the frontend

    * Using ElectricSQL for the run page data

    * Min size for resizable panel on test page

    * Don’t load the trace in the RunPresenter anymore

    * Fix for the resizable panels on the run page

    * Added overflow hidden to the panel group

    * min size for the test page left hand panel

    * Updated to latest window-splitter version

    * Removed unused const

    * One fix for client-server mismatch

    * Slight improvement in the loading state

    * Restructured the page so the loading is better

    * Improvement to the loading states

    * Improved the loading behaviour with the inspector

    * WIP on auth, having problems with it

    * Upgrade Remix to 2.9.1 (same as PR #1096)

    * Switched structure around again so we only call the useTrace hook from the client

    * Added auth to the sync

    * Overscan more rows in the tree view

    * Fix for TS error

    * Remove duplicate import

    * Revert "Upgrade Remix to 2.9.1 (same as PR #1096)"

    This reverts commit e63ee9e.

    * save cookie only when id is used

    * Deployment table now scrolls

    * removed imports

    * A lot of changes to make the inspector live too… WIP

    * More major overhauls to get the synced version of the run page working…

    * If a span is completed show that

    * Set the debounce much lower for selecting the span view

    * Load the details run inspector data on demand

    * Delete the SpanPresenter

    * Use the async payload because it deals with superjson

    * Fixed weird merge conflict

    * Share some inspector timeline components

    * A couple of layout tweaks

    * Improved the run inspector loading states

    * Fix for paragaph errors

    * Fix for focusing on a span

    * Undefined typre for useSyncedShape

    * ELECTRIC_ORIGIN env var doesn’t have a default, added to the examples

    * Updated @electric-sql/react package to the latest

    * Fix the timeline duration stretching

    * Added some better error handling for the electric sync

    * More logging

    * Better error when there are bad responses

    * Turn off resizable snapshots, there’s a bug

    * Added getSpan back

    * Added SpanPresenter back

    * Updated to the new Electric hooks package

    * Made a copy so we have the old run page and the new electric one

    * Put the main eventRepository back for now

    ---------

    Co-authored-by: nicktrn <[email protected]>
    Co-authored-by: James Ritchie <[email protected]>
    Co-authored-by: James Ritchie <[email protected]>
    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit a66db33
Author: Matt Aitken <[email protected]>
Date:   Thu Sep 12 15:08:37 2024 +0100

    If there’s no attempt and no worker when finalizing a run, don’t try to create an attempt

commit f90e23d
Author: Matt Aitken <[email protected]>
Date:   Thu Sep 12 15:08:06 2024 +0100

    Fix so failed cancels in the dashboard redirect back with an error message

commit 0cc5604
Author: Matt Aitken <[email protected]>
Date:   Thu Sep 12 14:27:42 2024 +0100

    batchTriggerAndWait checkpoint race condition when at max concurrency (#1296)

    * Ignore /packages/cli-v3/src/package.json

    * Added more logs when resuming a dependency, added the runId

    * A task for reproducing a race condition with checkpoints

    * Fix for doing remote image build when not self-hosting

    * Set team members, alerts and schedule limits to 100m for self-hosting

    * Import fix

    * Set the checkpointEventId in marqs when the checkpoint is created for batchTriggerAndWait

    This should fix a horrible race condition when at max concurrency

commit 67547d2
Author: Eric Allam <[email protected]>
Date:   Thu Sep 12 14:12:07 2024 +0100

    Remove payload from task run task events

commit a50063c
Author: Eric Allam <[email protected]>
Date:   Thu Sep 12 13:28:43 2024 +0100

    Always insert the dirs option when initializing a new project in the trigger.config.ts

commit 8c690a9
Author: Eric Allam <[email protected]>
Date:   Thu Sep 12 13:09:17 2024 +0100

    Make sure BuildManifest is exported from @trigger.dev/build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants