Auto merge of #1703 - sgrif:sg-heroku-release-phase, r=jtgeibel

bors · bors · commit 101a935edaa5 · 2019-05-01T23:26:06.000Z
Use the release phase to run migrations When we deploy today, the current process is as follows: - Build new code - If build fails, release is aborted, failed build does not affect cache - Shut down all dynos - At this point requests are being held by the router until the new dyno starts up. The 30 second timeout still applies, so we have to boot and process the queued requests in under 30 seconds or they will get a 503 - Wait for shutdown to complete - Run database migrations - If this fails, our app crashes, all requests will return 500 until we manually intervene - Boot app - If this fails, our app crashes, all requests will return 500 until we manually intervene - Requests are processed With this change, the flow will now be: - Build new code - Run database migrations - If this fails, the release will not be deployed. The build cache will still contain the results of this build, since the build itself succeeded. - Shut down all dynos - Wait for shutdown to complete - Boot app - Requests are processed This has a few notable benefits: - If a migration fails in production (this has happened a few times), we will just not deploy the release instead of completely taking the app down - Since our migrations run in parallel with the app, we can make changes which take a long time to run, as long as those changes don't lock the table they're operating on - Our app will boot more quickly, leading to shorter delays when we deploy This doesn't come for free of course. Since our app will still be running with the migrations, we'll have to be careful about what we put in a single deploy. - Any destructive migrations will need to be deployed in at least two phases. First we deploy code that works with both the old and new schema, then we deploy the migration (along with code that only works on the new schema) separately. - We already follow this practice today in order to make sure that each individual deploy can be rolled back quickly - Migrations which perform long running changes will need to be sure that those changes do not lock the table they're operating on - We mostly already practice this today, since we don't want to delay booting the app with a long migration - Migrations need to run in a transaction to avoid leaving things in a half migrated state if something fails - Diesel already does this Overall I think this tradeoff is worth it. Removing the risk of a failed migration taking the app down is a huge win, and we already follow the practices required to do this safely. With this change, we may also want to consider enabling [preboot](https://devcenter.heroku.com/articles/preboot), which ensures that new dynos are running and accepting connections before shutting down the old servers -- at the cost of making deploys take several minutes longer.
diff --git a/Procfile b/Procfile
@@ -1,2 +1,3 @@
-web: bin/diesel migration run && bin/start-nginx ./target/release/server
+release: bin/diesel migration run
+web: bin/start-nginx ./target/release/server
 background_worker: ./target/release/background-worker