You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Auto merge of #1703 - sgrif:sg-heroku-release-phase, r=jtgeibel
Use the release phase to run migrations
When we deploy today, the current process is as follows:
- Build new code
- If build fails, release is aborted, failed build does not affect
cache
- Shut down all dynos
- At this point requests are being held by the router until the new
dyno starts up. The 30 second timeout still applies, so we have to
boot and process the queued requests in under 30 seconds or they
will get a 503
- Wait for shutdown to complete
- Run database migrations
- If this fails, our app crashes, all requests will return 500 until
we manually intervene
- Boot app
- If this fails, our app crashes, all requests will return 500 until
we manually intervene
- Requests are processed
With this change, the flow will now be:
- Build new code
- Run database migrations
- If this fails, the release will not be deployed. The build cache
will still contain the results of this build, since the build itself
succeeded.
- Shut down all dynos
- Wait for shutdown to complete
- Boot app
- Requests are processed
This has a few notable benefits:
- If a migration fails in production (this has happened a few times), we
will just not deploy the release instead of completely taking the app
down
- Since our migrations run in parallel with the app, we can make changes
which take a long time to run, as long as those changes don't lock the
table they're operating on
- Our app will boot more quickly, leading to shorter delays when we
deploy
This doesn't come for free of course. Since our app will still be
running with the migrations, we'll have to be careful about what we put
in a single deploy.
- Any destructive migrations will need to be deployed in at least two
phases. First we deploy code that works with both the old and new
schema, then we deploy the migration (along with code that only works
on the new schema) separately.
- We already follow this practice today in order to make sure that
each individual deploy can be rolled back quickly
- Migrations which perform long running changes will need to be sure
that those changes do not lock the table they're operating on
- We mostly already practice this today, since we don't want to delay
booting the app with a long migration
- Migrations need to run in a transaction to avoid leaving things in a
half migrated state if something fails
- Diesel already does this
Overall I think this tradeoff is worth it. Removing the risk of a failed
migration taking the app down is a huge win, and we already follow the
practices required to do this safely.
With this change, we may also want to consider enabling
[preboot](https://devcenter.heroku.com/articles/preboot), which ensures
that new dynos are running and accepting connections before shutting
down the old servers -- at the cost of making deploys take several
minutes longer.
0 commit comments