Skip to content

Ensure the update_downloads job doesn't run concurrently #2157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 6, 2020

Conversation

jtgeibel
Copy link
Member

If multiple instances of this job are run concurrently then it is
possible to overcount downloads, at least temporarily. The job first
selects all matching version_downloads and later uses those values to
calculate how many downloads to add to versions and crates. If a
second job is run, it would select some rows from version_downloads
that were already queued for processing by the first task.

If an overcount were to occur, the next time the job is run it should
calculate a negative adjustment and correct the situation. There's no
point in doing extra work and if we eventually need concurrency we
should built that out intentionally. Therefore, this commit wraps the
entire job in a transaction and obtains an transaction level advisory
lock from the database.

If the lock has already been taken the job will fail and will be retried
by swirl. If the duration of this job begins to approach the scheduling
interval, then we will want to increase that interval to avoid
triggering alerts.

If multiple instances of this job are run concurrently then it is
possible to overcount downloads, at least temporarily.  The job first
selects all matching `version_downloads` and later uses those values to
calculate how many downloads to add to `versions` and `crates`.  If a
second job is run, it would select some rows from `version_downloads`
that were already queued for processing by the first task.

If an overcount were to occur, the next time the job is run it should
calculate a negative adjustment and correct the situation.  There's no
point in doing extra work and if we eventually need concurrency we
should built that out intentionally.  Therefore, this commit wraps the
entire job in a transaction and obtains an transaction level advisory
lock from the database.

If the lock has already been taken the job will fail and will be retried
by swirl.  If the duration of this job begins to approach the scheduling
interval, then we will want to increase that interval to avoid
triggering alerts.
@rust-highfive
Copy link

r? @sgrif

(rust_highfive has picked a reviewer for you, use r? to override)

@jtgeibel
Copy link
Member Author

jtgeibel commented Feb 6, 2020

r? @smarnach

@rust-highfive rust-highfive assigned smarnach and unassigned sgrif Feb 6, 2020
@sgrif
Copy link
Contributor

sgrif commented Feb 6, 2020

@bors r+

@bors
Copy link
Contributor

bors commented Feb 6, 2020

📌 Commit 0b03ae6 has been approved by sgrif

@bors
Copy link
Contributor

bors commented Feb 6, 2020

⌛ Testing commit 0b03ae6 with merge c07223b...

bors added a commit that referenced this pull request Feb 6, 2020
Ensure the update_downloads job doesn't run concurrently

If multiple instances of this job are run concurrently then it is
possible to overcount downloads, at least temporarily.  The job first
selects all matching `version_downloads` and later uses those values to
calculate how many downloads to add to `versions` and `crates`.  If a
second job is run, it would select some rows from `version_downloads`
that were already queued for processing by the first task.

If an overcount were to occur, the next time the job is run it should
calculate a negative adjustment and correct the situation.  There's no
point in doing extra work and if we eventually need concurrency we
should built that out intentionally.  Therefore, this commit wraps the
entire job in a transaction and obtains an transaction level advisory
lock from the database.

If the lock has already been taken the job will fail and will be retried
by swirl.  If the duration of this job begins to approach the scheduling
interval, then we will want to increase that interval to avoid
triggering alerts.
@bors
Copy link
Contributor

bors commented Feb 6, 2020

☀️ Test successful - checks-travis
Approved by: sgrif
Pushing c07223b to master...

@bors bors merged commit 0b03ae6 into rust-lang:master Feb 6, 2020
jtgeibel added a commit to jtgeibel/crates.io that referenced this pull request Feb 15, 2020
It appears that the additional wrapper transaction added around the
update_downloads task causes delays and timeouts to download requests
whenever the background job is run.  Reverting so that master can be
deployed.

Revert "Auto merge of rust-lang#2157 - jtgeibel:add-lock-to-update-downloads-job, r=sgrif"

This reverts commit c07223b, reversing
changes made to c6d13eb.
bors added a commit that referenced this pull request Feb 15, 2020
Revert changes to update_downloads task in #2157

It appears that the additional wrapper transaction added around the
update_downloads task causes delays and timeouts to download requests
whenever the background job is run.  Reverting so that master can be
deployed.

r? @ghost
bors added a commit that referenced this pull request Feb 15, 2020
Revert changes to update_downloads task in #2157

It appears that the additional wrapper transaction added around the
update_downloads task causes delays and timeouts to download requests
whenever the background job is run.  Reverting so that master can be
deployed.

r? @ghost
@jtgeibel jtgeibel deleted the add-lock-to-update-downloads-job branch March 8, 2020 22:37
jtgeibel added a commit to jtgeibel/crates.io that referenced this pull request Mar 10, 2020
This is an improved implementation of rust-lang#2157.  The previous design
relied on a transaction based lock to manage the lifetime of the lock.
The wrapper transaction caused the `update_downloads` job to interfere
with incoming download requests, and the changes had to be reverted.

This implementation uses a session lock which is automatically released
even if the callback panics.

If multiple instances of the `update_downloads` job are run
concurrently then it is possible to over count downloads, at least
temporarily.  The first job selects all matching `version_downloads`
and later uses those values to calculate how many downloads to add to
`versions` and `crates`.  If a second job is run, it would select some
rows from `version_downloads` that were already queued for processing
by the first task.

If an over count were to occur, the next time the job is run it should
calculate a negative adjustment and correct the situation.  There's no
point in doing extra work and if we eventually need concurrency we
should built that out intentionally.  Therefore, this commit wraps the
entire job in a transaction and obtains an transaction level advisory
lock from the database.

If the lock has already been taken the job will fail and will be
retried by swirl.  If the duration of this job begins to approach the
scheduling interval, then we will want to increase that interval to
avoid triggering alerts.
jtgeibel added a commit to jtgeibel/crates.io that referenced this pull request Mar 22, 2020
This is an improved implementation of rust-lang#2157.  The previous design
relied on a transaction based lock to manage the lifetime of the lock.
The wrapper transaction caused the `update_downloads` job to interfere
with incoming download requests, and the changes had to be reverted.

This implementation uses a session lock which is automatically released
even if the callback panics.

If multiple instances of the `update_downloads` job are run
concurrently then it is possible to over count downloads, at least
temporarily.  The first job selects all matching `version_downloads`
and later uses those values to calculate how many downloads to add to
`versions` and `crates`.  If a second job is run, it would select some
rows from `version_downloads` that were already queued for processing
by the first task.

If an over count were to occur, the next time the job is run it should
calculate a negative adjustment and correct the situation.  There's no
point in doing extra work and if we eventually need concurrency we
should built that out intentionally.  Therefore, this commit wraps the
entire job in a transaction and obtains an transaction level advisory
lock from the database.

If the lock has already been taken the job will fail and will be
retried by swirl.  If the duration of this job begins to approach the
scheduling interval, then we will want to increase that interval to
avoid triggering alerts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants