Skip to content

Don't lock more than 1 row at a time in update-downloads #1345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 13, 2018

Conversation

sgrif
Copy link
Contributor

@sgrif sgrif commented Apr 13, 2018

Prior to this commit, bin/update-downloads will hold a lock on every
row it updates for the entire batch of 1000. Each row gets locked when
it is updated, but the lock isn't released until the whole transaction
is comitted. This means that the first row in the batch is locked for
the whole transaction, but the last row in the batch isn't locked longer
than it should be.

Processing 1000 rows is routinely taking more than 30 seconds to
complete. Whenever it does, this causes requests to time out. Even when
it doesn't, this locking behavior is the reason that our 99th percentile
response time is getting to be so bad.

The endpoint affected by this is /crates/{name}/{version}/download. In
other words, this bug is causing cargo install to fail. Even when it
doesn't, this bug is causing it to take seconds when it should take
milliseconds.

By establishing a transaction per row instead of per batch, we retain
atomicity, without holding a lock for the whole batch. To keep
everything atomic, we need to update the global counter per row instead
of per batch.

This relies on the invariant that no rows in version_downloads get
touched once date != CURRENT_DATE other than by this script.

I believe this is also the root cause of #1304, and that we can maybe
revert #1312

Prior to this commit, `bin/update-downloads` will hold a lock on every
row it updates for the entire batch of 1000. Each row gets locked when
it is `updated`, but the lock isn't released until the whole transaction
is comitted. This means that the first row in the batch is locked for
the whole transaction, but the last row in the batch isn't locked longer
than it should be.

Processing 1000 rows is routinely taking more than 30 seconds to
complete. Whenever it does, this causes requests to time out. Even when
it doesn't, this locking behavior is the reason that our 99th percentile
response time is getting to be so bad.

The endpoint affected by this is `/crates/{name}/{version}/download`. In
other words, this bug is causing `cargo install` to fail. Even when it
doesn't, this bug is causing it to take seconds when it should take
milliseconds.

By establishing a transaction per row instead of per batch, we retain
atomicity, without holding a lock for the whole batch. To keep
everything atomic, we need to update the global counter per row instead
of per batch.

This relies on the invariant that no rows in `version_downloads` get
touched once `date != CURRENT_DATE` other than by this script.

I believe this is also the root cause of rust-lang#1304, and that we can maybe
revert rust-lang#1312
@sgrif sgrif requested a review from ashleygwilliams April 13, 2018 16:01
@sgrif
Copy link
Contributor Author

sgrif commented Apr 13, 2018

bors: r+

bors-voyager bot added a commit that referenced this pull request Apr 13, 2018
1345: Don't lock more than 1 row at a time in `update-downloads` r=sgrif

Prior to this commit, `bin/update-downloads` will hold a lock on every
row it updates for the entire batch of 1000. Each row gets locked when
it is `updated`, but the lock isn't released until the whole transaction
is comitted. This means that the first row in the batch is locked for
the whole transaction, but the last row in the batch isn't locked longer
than it should be.

Processing 1000 rows is routinely taking more than 30 seconds to
complete. Whenever it does, this causes requests to time out. Even when
it doesn't, this locking behavior is the reason that our 99th percentile
response time is getting to be so bad.

The endpoint affected by this is `/crates/{name}/{version}/download`. In
other words, this bug is causing `cargo install` to fail. Even when it
doesn't, this bug is causing it to take seconds when it should take
milliseconds.

By establishing a transaction per row instead of per batch, we retain
atomicity, without holding a lock for the whole batch. To keep
everything atomic, we need to update the global counter per row instead
of per batch.

This relies on the invariant that no rows in `version_downloads` get
touched once `date != CURRENT_DATE` other than by this script.

I believe this is also the root cause of #1304, and that we can maybe
revert #1312
@sgrif
Copy link
Contributor Author

sgrif commented Apr 13, 2018

bors: r-

(clippy lint failed, CI passed otherwise, just going to manually merge after fixing)

@bors-voyager
Copy link
Contributor

bors-voyager bot commented Apr 13, 2018

Canceled

@sgrif sgrif merged commit 86cc3d4 into rust-lang:master Apr 13, 2018
@sgrif sgrif deleted the sg-dont-lock-downloads branch April 14, 2018 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants