-
-
Notifications
You must be signed in to change notification settings - Fork 742
v4: fix race condition when continuing run when blocked at the same time #2073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughA new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant RunEngine
participant RaceSimulationSystem
participant WaitpointSystem
Client->>RunEngine: trigger(runId, organizationId)
RunEngine->>WaitpointSystem: blockRunWithWaitpoint({runId, ..., organizationId})
WaitpointSystem->>RaceSimulationSystem: waitForRacepoint({runId})
RaceSimulationSystem-->>WaitpointSystem: (waits or resolves)
WaitpointSystem->>WaitpointSystem: process waitpoints, update state
Client->>RunEngine: registerRacepointForRun({runId, waitInterval})
RunEngine->>RaceSimulationSystem: registerRacepointForRun({runId, waitInterval})
RaceSimulationSystem->>RaceSimulationSystem: create and resolve racepoint after interval
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (2)
498-513
:⚠️ Potential issueRacepoint sync must precede waitpoint query to avoid lost blockers
continueRunIfUnblocked
fetches the list of waitpoints before callingwaitForRacepoint
.
If another thread inserts a new blocking waitpoint between the SELECT and the wait, the current thread will:
- Miss the newly-inserted waitpoint.
- Conclude the run is unblocked.
- Wrongly resume the run, re-introducing the very race we are trying to solve.
Move the race-point wait above the query so that the read view includes all inserts made prior to the sync.
- // 1. Get the any blocking waitpoints - const blockingWaitpoints = await this.$.prisma.taskRunWaitpoint.findMany({ - where: { taskRunId: runId }, - select: { - id: true, - batchId: true, - batchIndex: true, - waitpoint: { - select: { id: true, status: true }, - }, - }, - }); - - await this.$.raceSimulationSystem.waitForRacepoint({ runId }); + // Synchronise first to guarantee a consistent view + await this.$.raceSimulationSystem.waitForRacepoint({ runId }); + + // 1. Get any blocking waitpoints + const blockingWaitpoints = await this.$.prisma.taskRunWaitpoint.findMany({ + where: { taskRunId: runId }, + select: { + id: true, + batchId: true, + batchIndex: true, + waitpoint: { + select: { id: true, status: true }, + }, + }, + });
352-369
: 💡 Verification agent🧩 Analysis chain
Add-on parameter requires call-site audit
blockRunWithWaitpoint
now mandatesorganizationId
.
Please verify every invocation of this method across the code-base (tests, helpers, scheduled jobs, etc.) has been updated; otherwise the compiler/TS-server will complain, but dynamicany
usage might silently passundefined
.
🏁 Script executed:
#!/bin/bash # Finds all call-sites still passing 6 args (old signature) instead of ≥7. rg -n $'blockRunWithWaitpoint\\(' | awk -F':' '{print $1":"$2}'Length of output: 2690
🏁 Script executed:
#!/bin/bash set -eu echo "Call-sites missing 'organizationId' in the argument object:" for file in $(rg -l "blockRunWithWaitpoint\("); do if ! rg -q "blockRunWithWaitpoint\(\{[^}]*organizationId" "$file"; then rg -n "blockRunWithWaitpoint\(" "$file" fi doneLength of output: 5375
Audit and update all
blockRunWithWaitpoint
call-sites to includeorganizationId
blockRunWithWaitpoint
now requires anorganizationId
field in its argument object. The following call-sites are still missing it; please update each to pass the correctorganizationId
from scope:• internal-packages/run-engine/src/engine/index.ts
– lines 524, 909, 994
• internal-packages/run-engine/src/engine/tests/releaseConcurrency.test.ts
– lines 108, 159, 308, 360, 496, 548, 683, 845, 1027, 1191, 1345
• internal-packages/run-engine/src/engine/tests/triggerAndWait.test.ts (line 368)
• internal-packages/run-engine/src/engine/tests/waitpoints.test.ts
– lines 98, 211, 352, 491, 617, 759, 909, 1069, 1224, 1423
• internal-packages/run-engine/src/engine/tests/waitpointRace.test.ts (lines 71, 90)
• internal-packages/run-engine/src/engine/tests/checkpoints.test.ts
– lines 98, 354, 404, 557, 851
• apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.wait.duration.ts (line 48)
• apps/webapp/app/routes/engine.v1.runs.$runFriendlyId.waitpoints.tokens.$waitpointFriendlyId.wait.ts (line 40)
• apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts (line 71)Each object literal passed into
blockRunWithWaitpoint({ … })
must now includeorganizationId: /* your org ID variable */
before merging.
🧹 Nitpick comments (4)
internal-packages/run-engine/src/engine/systems/raceSimulationSystem.ts (2)
4-4
: PreferMap
over plain object to avoid prototype-pollution pitfalls
Record<string, Promise<void>>
will work for UUID-ish runIds, but using aMap
avoids collisions with__proto__
,hasOwnProperty
, … and provides cleaner semantics.- private racepoints: Record<string, Promise<void> | undefined> = {}; + private racepoints = new Map<string, Promise<void>>();
6-7
: Remove the empty constructorThe synthetic constructor adds byte-code without value and was flagged by Biome.
- constructor() {}
🧰 Tools
🪛 Biome (1.9.4)
[error] 6-6: This constructor is unnecessary.
Unsafe fix: Remove the unnecessary constructor.
(lint/complexity/noUselessConstructor)
internal-packages/run-engine/src/engine/tests/waitpointRace.test.ts (1)
104-106
: Avoid console noise in automated test runs
console.log("joinRow", joinRow);
will pollute CI output and can mask real issues when the suite scales.- console.log("joinRow", joinRow);
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)
667-679
: Optional: Delete waitpoint links inside the lockThe cleanup of
TaskRunWaitpoint
rows happens after the run-level lock is released.
Two concurrentcontinueRunIfUnblocked
jobs could both pass the blockers-completed check, then attempt to delete the same rows, causing redundant work or lost updates.Consider performing the deletion inside the
runLock
or, at minimum, guarding it withON CONFLICT DO NOTHING
/IF EXISTS
semantics.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (5)
internal-packages/run-engine/src/engine/index.ts
(7 hunks)internal-packages/run-engine/src/engine/systems/raceSimulationSystem.ts
(1 hunks)internal-packages/run-engine/src/engine/systems/systems.ts
(2 hunks)internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
(7 hunks)internal-packages/run-engine/src/engine/tests/waitpointRace.test.ts
(1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
internal-packages/run-engine/src/engine/systems/raceSimulationSystem.ts
[error] 6-6: This constructor is unnecessary.
Unsafe fix: Remove the unnecessary constructor.
(lint/complexity/noUselessConstructor)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
internal-packages/run-engine/src/engine/systems/systems.ts (1)
8-19
: ✅ Resource wiring looks correct
RaceSimulationSystem
is imported and exposed throughSystemResources
.
All downstream systems that require the new dependency can now pick it up via DI, so there is no immediate concern in this slice.internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)
379-381
: Race-simulation hook added in the right placeSynchronising on
this.$.raceSimulationSystem.waitForRacepoint
before acquiring the run lock looks correct and should eliminate the original race condition.
This fixes an issue where a run could get stuck when two things happened with the following exact timing:
The fix for this was simply to only delete blocking TaskRunWaitpoint records that participated in the blocking waitpoint guard at the beginning of continueRunIfUnblocked
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores