Skip to content

Remove the id column from version_downloads #1694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 1, 2019

Conversation

sgrif
Copy link
Contributor

@sgrif sgrif commented Mar 22, 2019

⚠️ This PR contains destructive migrations. Each commit must be deployed separately ⚠️

The version_downloads table and its indexes account for a whopping 69% of our total database size. The primary key alone is nearly 1GB, or 25% of our total database. This is never really used for anything besides fetching in batches in update-downloads. We could just do batching based on crate_id and date, but I've opted to remove the batching entirely. We usually process somewhere between 2k and 3k rows every time it runs -- we'd have to process over 50k rows to even consume 1MB of memory, so we can just load them all at once.

With our code changed to never remove the column, we can just completely remove it. This brings the size of the table down to 1600MB total, or 1256MB after a full vacuum.

Since Diesel always runs migrations in a transaction, and we can't add an index concurrently inside a transaction, the down.sql included is only for development/tests. If we need to roll this migration back in production, the column will need to be added/populated and the indexes created manually in a way that doesn't lock the table. If we were to run the blocking version included in down.sql against production, the table would be locked for multiple minutes, which would take some pages down and make the download endpoint painfully slow.

I don't anticipate needing to roll this migration back.

@bors
Copy link
Contributor

bors commented Mar 28, 2019

☔ The latest upstream changes (presumably #1699) made this pull request unmergeable. Please resolve the merge conflicts.

sgrif added 2 commits April 1, 2019 10:00
The `id` column will be removed in the next commit
This column is no longer being used, and can be safely removed. Rolling
back this migration in production requires creating indexes
concurrently, which Diesel CLI doesn't support, so rolling back this
migration in production will require queries to be run manually. A
blocking version is included for development and testing
@sgrif
Copy link
Contributor Author

sgrif commented Apr 1, 2019

@bors r=jtgeibel

@bors
Copy link
Contributor

bors commented Apr 1, 2019

📌 Commit fbfb1b0 has been approved by jtgeibel

@sgrif sgrif force-pushed the sg-remove-pk-from-version-downloads branch from fbfb1b0 to 86ba333 Compare April 1, 2019 16:00
@sgrif
Copy link
Contributor Author

sgrif commented Apr 1, 2019

@bors r=jtgeibel

@bors
Copy link
Contributor

bors commented Apr 1, 2019

📌 Commit 86ba333 has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented Apr 1, 2019

⌛ Testing commit 86ba333 with merge 59a4ca9...

bors added a commit that referenced this pull request Apr 1, 2019
…geibel

Remove the `id` column from `version_downloads`

:warning: This PR contains destructive migrations. Each commit must be deployed separately :warning:

The `version_downloads` table and its indexes account for a whopping 69% of our total database size. The primary key alone is nearly 1GB, or 25% of our total database. This is never really used for anything besides fetching in batches in `update-downloads`. We could just do batching based on crate_id and date, but I've opted to remove the batching entirely. We usually process somewhere between 2k and 3k rows every time it runs -- we'd have to process over 50k rows to even consume 1MB of memory, so we can just load them all at once.

With our code changed to never remove the column, we can just completely remove it. This brings the size of the table down to 1600MB total, or 1256MB after a full vacuum.

Since Diesel always runs migrations in a transaction, and we can't add an index concurrently inside a transaction, the `down.sql` included is only for development/tests. If we need to roll this migration back in production, the column will need to be added/populated and the indexes created manually in a way that doesn't lock the table. If we were to run the blocking version included in down.sql against production, the table would be locked for multiple minutes, which would take some pages down and make the download endpoint painfully slow.

I don't anticipate needing to roll this migration back.
@bors
Copy link
Contributor

bors commented Apr 1, 2019

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing 59a4ca9 to master...

@bors bors merged commit 86ba333 into rust-lang:master Apr 1, 2019
@sgrif sgrif deleted the sg-remove-pk-from-version-downloads branch May 14, 2019 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants