-
Notifications
You must be signed in to change notification settings - Fork 648
Use the release phase to run migrations #1703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When we deploy today, the current process is as follows: - Build new code - If build fails, release is aborted, failed build does not affect cache - Shut down all dynos - At this point requests are being held by the router until the new dyno starts up. The 30 second timeout still applies, so we have to boot and process the queued requests in under 30 seconds or they will get a 503 - Wait for shutdown to complete - Run database migrations - If this fails, our app crashes, all requests will return 500 until we manually intervene - Boot app - If this fails, our app crashes, all requests will return 500 until we manually intervene - Requests are processed With this change, the flow will now be: - Build new code - Run database migrations - If this fails, the release will not be deployed. The build cache will still contain the results of this build, since the build itself succeeded. - Shut down all dynos - Wait for shutdown to complete - Boot app - Requests are processed This has a few notable benefits: - If a migration fails in production (this has happened a few times), we will just not deploy the release instead of completely taking the app down - Since our migrations run in parallel with the app, we can make changes which take a long time to run, as long as those changes don't lock the table they're operating on - Our app will boot more quickly, leading to shorter delays when we deploy This doesn't come for free of course. Since our app will still be running with the migrations, we'll have to be careful about what we put in a single deploy. - Any destructive migrations will need to be deployed in at least two phases. First we deploy code that works with both the old and new schema, then we deploy the migration (along with code that only works on the new schema) separately. - We already follow this practice today in order to make sure that each individual deploy can be rolled back quickly - Migrations which perform long running changes will need to be sure that those changes do not lock the table they're operating on - We mostly already practice this today, since we don't want to delay booting the app with a long migration - Migrations need to run in a transaction to avoid leaving things in a half migrated state if something fails - Diesel already does this Overall I think this tradeoff is worth it. Removing the risk of a failed migration taking the app down is a huge win, and we already follow the practices required to do this safely. With this change, we may also want to consider enabling [preboot](https://devcenter.heroku.com/articles/preboot), which ensures that new dynos are running and accepting connections before shutting down the old servers -- at the cost of making deploys take several minutes longer.
Looks good to me. |
LGTM! @bors r+ |
📌 Commit e46a1a5 has been approved by |
bors
added a commit
that referenced
this pull request
May 1, 2019
Use the release phase to run migrations When we deploy today, the current process is as follows: - Build new code - If build fails, release is aborted, failed build does not affect cache - Shut down all dynos - At this point requests are being held by the router until the new dyno starts up. The 30 second timeout still applies, so we have to boot and process the queued requests in under 30 seconds or they will get a 503 - Wait for shutdown to complete - Run database migrations - If this fails, our app crashes, all requests will return 500 until we manually intervene - Boot app - If this fails, our app crashes, all requests will return 500 until we manually intervene - Requests are processed With this change, the flow will now be: - Build new code - Run database migrations - If this fails, the release will not be deployed. The build cache will still contain the results of this build, since the build itself succeeded. - Shut down all dynos - Wait for shutdown to complete - Boot app - Requests are processed This has a few notable benefits: - If a migration fails in production (this has happened a few times), we will just not deploy the release instead of completely taking the app down - Since our migrations run in parallel with the app, we can make changes which take a long time to run, as long as those changes don't lock the table they're operating on - Our app will boot more quickly, leading to shorter delays when we deploy This doesn't come for free of course. Since our app will still be running with the migrations, we'll have to be careful about what we put in a single deploy. - Any destructive migrations will need to be deployed in at least two phases. First we deploy code that works with both the old and new schema, then we deploy the migration (along with code that only works on the new schema) separately. - We already follow this practice today in order to make sure that each individual deploy can be rolled back quickly - Migrations which perform long running changes will need to be sure that those changes do not lock the table they're operating on - We mostly already practice this today, since we don't want to delay booting the app with a long migration - Migrations need to run in a transaction to avoid leaving things in a half migrated state if something fails - Diesel already does this Overall I think this tradeoff is worth it. Removing the risk of a failed migration taking the app down is a huge win, and we already follow the practices required to do this safely. With this change, we may also want to consider enabling [preboot](https://devcenter.heroku.com/articles/preboot), which ensures that new dynos are running and accepting connections before shutting down the old servers -- at the cost of making deploys take several minutes longer.
☀️ Test successful - checks-travis |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When we deploy today, the current process is as follows:
cache
dyno starts up. The 30 second timeout still applies, so we have to
boot and process the queued requests in under 30 seconds or they
will get a 503
we manually intervene
we manually intervene
With this change, the flow will now be:
will still contain the results of this build, since the build itself
succeeded.
This has a few notable benefits:
will just not deploy the release instead of completely taking the app
down
which take a long time to run, as long as those changes don't lock the
table they're operating on
deploy
This doesn't come for free of course. Since our app will still be
running with the migrations, we'll have to be careful about what we put
in a single deploy.
phases. First we deploy code that works with both the old and new
schema, then we deploy the migration (along with code that only works
on the new schema) separately.
each individual deploy can be rolled back quickly
that those changes do not lock the table they're operating on
booting the app with a long migration
half migrated state if something fails
Overall I think this tradeoff is worth it. Removing the risk of a failed
migration taking the app down is a huge win, and we already follow the
practices required to do this safely.
With this change, we may also want to consider enabling
preboot, which ensures
that new dynos are running and accepting connections before shutting
down the old servers -- at the cost of making deploys take several
minutes longer.