Remove the `id` column from `version_downloads` #1694

sgrif · 2019-03-22T18:50:28Z

⚠️ This PR contains destructive migrations. Each commit must be deployed separately ⚠️

The version_downloads table and its indexes account for a whopping 69% of our total database size. The primary key alone is nearly 1GB, or 25% of our total database. This is never really used for anything besides fetching in batches in update-downloads. We could just do batching based on crate_id and date, but I've opted to remove the batching entirely. We usually process somewhere between 2k and 3k rows every time it runs -- we'd have to process over 50k rows to even consume 1MB of memory, so we can just load them all at once.

With our code changed to never remove the column, we can just completely remove it. This brings the size of the table down to 1600MB total, or 1256MB after a full vacuum.

Since Diesel always runs migrations in a transaction, and we can't add an index concurrently inside a transaction, the down.sql included is only for development/tests. If we need to roll this migration back in production, the column will need to be added/populated and the indexes created manually in a way that doesn't lock the table. If we were to run the blocking version included in down.sql against production, the table would be locked for multiple minutes, which would take some pages down and make the download endpoint painfully slow.

I don't anticipate needing to roll this migration back.

bors · 2019-03-28T18:50:56Z

☔ The latest upstream changes (presumably #1699) made this pull request unmergeable. Please resolve the merge conflicts.

The `id` column will be removed in the next commit

This column is no longer being used, and can be safely removed. Rolling back this migration in production requires creating indexes concurrently, which Diesel CLI doesn't support, so rolling back this migration in production will require queries to be run manually. A blocking version is included for development and testing

sgrif · 2019-04-01T16:00:27Z

@bors r=jtgeibel

bors · 2019-04-01T16:00:27Z

📌 Commit fbfb1b0 has been approved by jtgeibel

sgrif · 2019-04-01T16:14:34Z

@bors r=jtgeibel

bors · 2019-04-01T16:14:35Z

📌 Commit 86ba333 has been approved by jtgeibel

bors · 2019-04-01T16:17:41Z

⌛ Testing commit 86ba333 with merge 59a4ca9...

…geibel Remove the `id` column from `version_downloads` :warning: This PR contains destructive migrations. Each commit must be deployed separately :warning: The `version_downloads` table and its indexes account for a whopping 69% of our total database size. The primary key alone is nearly 1GB, or 25% of our total database. This is never really used for anything besides fetching in batches in `update-downloads`. We could just do batching based on crate_id and date, but I've opted to remove the batching entirely. We usually process somewhere between 2k and 3k rows every time it runs -- we'd have to process over 50k rows to even consume 1MB of memory, so we can just load them all at once. With our code changed to never remove the column, we can just completely remove it. This brings the size of the table down to 1600MB total, or 1256MB after a full vacuum. Since Diesel always runs migrations in a transaction, and we can't add an index concurrently inside a transaction, the `down.sql` included is only for development/tests. If we need to roll this migration back in production, the column will need to be added/populated and the indexes created manually in a way that doesn't lock the table. If we were to run the blocking version included in down.sql against production, the table would be locked for multiple minutes, which would take some pages down and make the download endpoint painfully slow. I don't anticipate needing to roll this migration back.

bors · 2019-04-01T16:34:58Z

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing 59a4ca9 to master...

jtgeibel approved these changes Apr 1, 2019

View reviewed changes

sgrif added 2 commits April 1, 2019 10:00

Use (version_id, date) to identify version_downloads

3c92ca2

The `id` column will be removed in the next commit

sgrif force-pushed the sg-remove-pk-from-version-downloads branch from fbfb1b0 to 86ba333 Compare April 1, 2019 16:00

bors merged commit 86ba333 into rust-lang:master Apr 1, 2019

sgrif deleted the sg-remove-pk-from-version-downloads branch May 14, 2019 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove the `id` column from `version_downloads` #1694

Remove the `id` column from `version_downloads` #1694

Uh oh!

sgrif commented Mar 22, 2019

Uh oh!

bors commented Mar 28, 2019

Uh oh!

sgrif commented Apr 1, 2019

Uh oh!

bors commented Apr 1, 2019

Uh oh!

sgrif commented Apr 1, 2019

Uh oh!

bors commented Apr 1, 2019

Uh oh!

bors commented Apr 1, 2019

Uh oh!

bors commented Apr 1, 2019

Uh oh!

Uh oh!

Remove the id column from version_downloads #1694

Remove the id column from version_downloads #1694

Uh oh!

Conversation

sgrif commented Mar 22, 2019

Uh oh!

bors commented Mar 28, 2019

Uh oh!

sgrif commented Apr 1, 2019

Uh oh!

bors commented Apr 1, 2019

Uh oh!

sgrif commented Apr 1, 2019

Uh oh!

bors commented Apr 1, 2019

Uh oh!

bors commented Apr 1, 2019

Uh oh!

bors commented Apr 1, 2019

Uh oh!

Uh oh!

Remove the `id` column from `version_downloads` #1694

Remove the `id` column from `version_downloads` #1694