Skip to content

v3: self-hosting #1147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Jun 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
d0bb06a
add amin email regex env var
nicktrn Apr 26, 2024
ca97d89
fix displayed init command for self-hosted setups
nicktrn Apr 26, 2024
8330f40
shared env var to disable telemetry in cli and webapp
nicktrn Apr 26, 2024
518cd98
pin sdk version during init
nicktrn Apr 26, 2024
4c3076b
if specified, add api url to dev command shown after init
nicktrn Apr 26, 2024
29b3961
improve checkpoint support detection
nicktrn Apr 26, 2024
dad56aa
control forced checkpoint simulation via env var
nicktrn Apr 26, 2024
be083bd
add public init to providers
nicktrn Apr 26, 2024
a3b884b
better checkpoint support check for coordinator
nicktrn Apr 26, 2024
3efdf7c
add docker to coordinator image
nicktrn Apr 26, 2024
16b707f
update docker provider containerfile
nicktrn Apr 26, 2024
0bbcb82
bump remaining containers to node 20
nicktrn Apr 26, 2024
e9c306c
Merge branch 'main' into v3/self-hosting
nicktrn Apr 28, 2024
8bdcaae
add infra image build to default publish workflow
nicktrn Apr 28, 2024
2043973
lockfile
nicktrn Apr 28, 2024
a625b8e
remove concurrency group from infra workflow
nicktrn Apr 28, 2024
43b0d30
add docker provider to build matrix
nicktrn Apr 28, 2024
b536573
fix var subst
nicktrn Apr 28, 2024
0b9c6ee
Merge branch 'main' into v3/self-hosting
nicktrn Jun 5, 2024
81ca34a
checkpoint test is docker specific
nicktrn Jun 5, 2024
3314169
enable v3 projects by default on self-hosted instances
nicktrn Jun 5, 2024
309608b
fix v3 setup command again
nicktrn Jun 5, 2024
5a062e4
add default posthog key
nicktrn Jun 6, 2024
baae1db
self-hosting docs
nicktrn Jun 7, 2024
298d240
Merge branch 'main' into v3/self-hosting
nicktrn Jun 7, 2024
701ea6f
add latest tags to versioned infra and webapp builds
nicktrn Jun 7, 2024
b5be579
some checkpoint errors should skip retrying
nicktrn Jun 7, 2024
0d39619
add changeset
nicktrn Jun 7, 2024
5ea561a
shorten paragraph
nicktrn Jun 7, 2024
1b355bd
some docs updates
nicktrn Jun 7, 2024
a4b311d
update tunnelling section
nicktrn Jun 7, 2024
3a0da69
add registry setup section
nicktrn Jun 7, 2024
53d6751
use correct cli push flag
nicktrn Jun 7, 2024
8265057
add checkout to v3 branch
nicktrn Jun 7, 2024
a62daf8
update the worker machine setup steps
nicktrn Jun 7, 2024
aa6d648
Merge branch 'main' into v3/self-hosting
nicktrn Jun 7, 2024
bf25c41
fix infra build
nicktrn Jun 7, 2024
4804b95
small docs update
nicktrn Jun 7, 2024
8bba64a
Merge branch 'main' into v3/self-hosting
nicktrn Jun 7, 2024
cfe0788
remove unused feature function
nicktrn Jun 7, 2024
41e8ab0
Revert "remove unused feature function"
nicktrn Jun 7, 2024
63b8a32
fix self-hosted v3 feature gate
nicktrn Jun 7, 2024
56dcb1e
add note about missing arm support
nicktrn Jun 10, 2024
b3db8ed
simplify helper script syntax
nicktrn Jun 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .changeset/spicy-terms-bow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
"@trigger.dev/core-apps": patch
"trigger.dev": patch
---

- Fix init command SDK pinning
- Show --api-url / -a flag where needed
- CLI now also respects `TRIGGER_TELEMETRY_DISABLED`
- Dedicated docker checkpoint test function
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ DEV_OTEL_BATCH_PROCESSING_ENABLED="0"
# OPTIONAL VARIABLES
# This is used for validating emails that are allowed to log in. Every email that do not match this regex will be rejected.
# WHITELISTED_EMAILS="authorized@yahoo\.com|authorized@gmail\.com"
# Accounts with these emails will get global admin rights. This grants access to the admin UI.
# ADMIN_EMAILS="admin@example\.com|another-admin@example\.com"
# This is used for logging in via GitHub. You can leave these commented out if you don't want to use GitHub for authentication.
# AUTH_GITHUB_CLIENT_ID=
# AUTH_GITHUB_CLIENT_SECRET=
Expand Down
17 changes: 15 additions & 2 deletions .github/workflows/publish-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,25 @@ jobs:
exit 1
fi
echo "::set-output name=version::${IMAGE_TAG}"

- name: 🔢 Get the commit hash
id: get_commit
run: |
echo ::set-output name=sha_short::$(echo ${{ github.sha }} | cut -c1-7)

- name: 📛 Set the tags
id: set_tags
run: |
ref_without_tag=ghcr.io/triggerdotdev/trigger.dev
image_tags=$ref_without_tag:${{ steps.get_version.outputs.version }}

# if it's a versioned tag, also tag it as latest
if [[ "${{ github.ref_name }}" == v.docker.* ]]; then
image_tags=$image_tags,$ref_without_tag:latest
fi

echo "IMAGE_TAGS=${image_tags}" >> "$GITHUB_OUTPUT"

- name: 🐙 Login to GitHub Container Registry
uses: docker/login-action@v2
with:
Expand All @@ -56,6 +70,5 @@ jobs:
with:
file: ./docker/Dockerfile
platforms: linux/amd64,linux/arm64
tags: |
ghcr.io/triggerdotdev/trigger.dev:${{ steps.get_version.outputs.version }}
tags: ${{ steps.set_tags.outputs.IMAGE_TAGS }}
push: true
49 changes: 38 additions & 11 deletions .github/workflows/publish-infra.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: "🚢 Publish Infra Images"

on:
workflow_call:
push:
tags:
- "infra-dev-*"
Expand Down Expand Up @@ -29,17 +30,14 @@ permissions:
packages: write
contents: read

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}

env:
AWS_REGION: us-east-1

jobs:
build:
strategy:
matrix:
package: [coordinator, kubernetes-provider]
package: [coordinator, docker-provider, kubernetes-provider]
runs-on: buildjet-16vcpu-ubuntu-2204
env:
DOCKER_BUILDKIT: "1"
Expand All @@ -48,20 +46,40 @@ jobs:

- name: Generate image reference
id: prep
# WARNING: This step expects the workflow to have been triggered by a specific tag format of: infra-${env}-*
run: |
env=$(echo ${{ github.ref_name }} | cut -d- -f2)
sha=${GITHUB_SHA::7}
ts=$(date +%s)
# set image repo
if [[ "${{ matrix.package }}" == *-provider ]]; then
provider_type=$(echo ${{ matrix.package }} | cut -d- -f1)
provider_type=$(echo "${{ matrix.package }}" | cut -d- -f1)
repository=provider/${provider_type}
else
repository=${{ matrix.package }}
repository="${{ matrix.package }}"
fi
echo "IMAGE_TAG=${env}-${sha}-${ts}" >> "$GITHUB_OUTPUT"
echo "REPOSITORY=${repository}" >> "$GITHUB_OUTPUT"

# set image tag
if [[ "${{ github.ref_type }}" == "tag" ]]; then
if [[ "${{ github.ref_name }}" == infra-*-* ]]; then
env=$(echo ${{ github.ref_name }} | cut -d- -f2)
sha=$(echo ${{ github.sha }} | head -c7)
ts=$(date +%s)
image_tag=${env}-${sha}-${ts}
elif [[ "${{ github.ref_name }}" == v.docker.* ]]; then
version="${GITHUB_REF_NAME#v.docker.}"
image_tag="v${version}"
elif [[ "${{ github.ref_name }}" == build-* ]]; then
image_tag="${GITHUB_REF_NAME#build-}"
else
echo "Invalid tag: ${{ github.ref_name }}"
exit 1
fi
elif [[ "${{ github.ref_name }}" == "main" ]]; then
image_tag="main"
else
echo "Invalid reference: ${{ github.ref }}"
exit 1
fi
echo "IMAGE_TAG=${image_tag}" >> "$GITHUB_OUTPUT"

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

Expand Down Expand Up @@ -92,3 +110,12 @@ jobs:
REGISTRY: ghcr.io/triggerdotdev
REPOSITORY: ${{ steps.prep.outputs.REPOSITORY }}
IMAGE_TAG: ${{ steps.prep.outputs.IMAGE_TAG }}

- name: 🐙 Push 'latest' to GitHub Container Registry
if: startsWith(github.ref_name, 'v.docker.')
run: |
docker tag infra_image $REGISTRY/$REPOSITORY:latest
docker push $REGISTRY/$REPOSITORY:latest
env:
REGISTRY: ghcr.io/triggerdotdev
REPOSITORY: ${{ steps.prep.outputs.REPOSITORY }}
5 changes: 5 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,8 @@ jobs:
needs: [typecheck, units]
uses: ./.github/workflows/publish-docker.yml
secrets: inherit

publish-infra:
needs: [typecheck, units]
uses: ./.github/workflows/publish-infra.yml
secrets: inherit
8 changes: 4 additions & 4 deletions apps/coordinator/Containerfile
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# syntax=docker/dockerfile:labs

FROM node:18-bullseye-slim@sha256:a4edd54dcfdcacc8a4100fee71498e8671d99556a1acf5614539214a70092426 AS node-18
FROM node:20-bookworm-slim@sha256:72f2f046a5f8468db28730b990b37de63ce93fd1a72a40f531d6aa82afdf0d46 AS node-20

WORKDIR /app

FROM node-18 AS pruner
FROM node-20 AS pruner

COPY --chown=node:node . .
RUN npx -q [email protected] prune --scope=coordinator --docker
RUN find . -name "node_modules" -type d -prune -exec rm -rf '{}' +

FROM node-18 AS base
FROM node-20 AS base

RUN apt-get update \
&& apt-get install -y buildah ca-certificates dumb-init \
&& apt-get install -y buildah ca-certificates dumb-init docker.io \
&& rm -rf /var/lib/apt/lists/*

COPY --chown=node:node .gitignore .gitignore
Expand Down
92 changes: 45 additions & 47 deletions apps/coordinator/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import {
} from "@trigger.dev/core/v3";
import { ZodNamespace } from "@trigger.dev/core/v3/zodNamespace";
import { ZodSocketConnection } from "@trigger.dev/core/v3/zodSocket";
import { HttpReply, getTextBody, SimpleLogger } from "@trigger.dev/core-apps";
import { HttpReply, getTextBody, SimpleLogger, testDockerCheckpoint } from "@trigger.dev/core-apps";
import { ExponentialBackoff } from "./backoff";

import { collectDefaultMetrics, register, Gauge } from "prom-client";
Expand Down Expand Up @@ -72,7 +72,10 @@ type CheckpointAndPushOptions = {

type CheckpointAndPushResult =
| { success: true; checkpoint: CheckpointData }
| { success: false; reason?: "CANCELED" | "DISABLED" | "ERROR" | "IN_PROGRESS" | "NO_SUPPORT" };
| {
success: false;
reason?: "CANCELED" | "DISABLED" | "ERROR" | "IN_PROGRESS" | "NO_SUPPORT" | "SKIP_RETRYING";
};

type CheckpointData = {
location: string;
Expand Down Expand Up @@ -125,65 +128,53 @@ class Checkpointer {

constructor(private opts = { forceSimulate: false }) {}

async initialize(): Promise<CheckpointerInitializeReturn> {
async init(): Promise<CheckpointerInitializeReturn> {
if (this.#initialized) {
return this.#getInitializeReturn();
return this.#getInitReturn(this.#canCheckpoint);
}

this.#logger.log(`${this.#dockerMode ? "Docker" : "Kubernetes"} mode`);

if (this.#dockerMode) {
try {
await $`criu --version`;
} catch (error) {
this.#logger.error("No checkpoint support: Missing CRIU binary");
this.#logger.error("Will simulate instead");
this.#canCheckpoint = false;
this.#initialized = true;
const testCheckpoint = await testDockerCheckpoint();

return this.#getInitializeReturn();
if (testCheckpoint.ok) {
return this.#getInitReturn(true);
}

try {
await $`docker checkpoint`;
} catch (error) {
this.#logger.error(
"No checkpoint support: Docker needs to have experimental features enabled"
);
this.#logger.error("Will simulate instead");
this.#canCheckpoint = false;
this.#initialized = true;

return this.#getInitializeReturn();
}
this.#logger.error(testCheckpoint.message, testCheckpoint.error ?? "");
return this.#getInitReturn(false);
} else {
try {
await $`buildah login --get-login ${REGISTRY_HOST}`;
} catch (error) {
this.#logger.error(`No checkpoint support: Not logged in to registry ${REGISTRY_HOST}`);
this.#canCheckpoint = false;
this.#initialized = true;

return this.#getInitializeReturn();
return this.#getInitReturn(false);
}
}

this.#logger.log(
`Full checkpoint support${
this.#dockerMode && this.opts.forceSimulate ? " with forced simulation enabled." : "!"
}`
);
return this.#getInitReturn(true);
}

#getInitReturn(canCheckpoint: boolean): CheckpointerInitializeReturn {
this.#initialized = true;
this.#canCheckpoint = true;
this.#canCheckpoint = canCheckpoint;

return this.#getInitializeReturn();
}
if (canCheckpoint) {
this.#logger.log("Full checkpoint support!");
}

const willSimulate = this.#dockerMode && (!this.#canCheckpoint || this.opts.forceSimulate);

if (willSimulate) {
this.#logger.log("Simulation mode enabled. Containers will be paused, not checkpointed.", {
forceSimulate: this.opts.forceSimulate,
});
}

#getInitializeReturn(): CheckpointerInitializeReturn {
return {
canCheckpoint: this.#canCheckpoint,
willSimulate: this.#dockerMode && (!this.#canCheckpoint || this.opts.forceSimulate),
canCheckpoint,
willSimulate,
};
}

Expand Down Expand Up @@ -327,6 +318,11 @@ class Checkpointer {
return result;
}

if (result.reason === "SKIP_RETRYING") {
this.#logger.log("Skipping retrying", { runId });
return result;
}

continue;
} catch (error) {
this.#logger.error("Checkpoint error", {
Expand Down Expand Up @@ -355,7 +351,7 @@ class Checkpointer {
projectRef,
deploymentVersion,
}: CheckpointAndPushOptions): Promise<CheckpointAndPushResult> {
await this.initialize();
await this.init();

const options = {
runId,
Expand Down Expand Up @@ -473,7 +469,8 @@ class Checkpointer {

// Create checkpoint (CRI)
if (!this.#canCheckpoint) {
throw new Error("No checkpoint support in kubernetes mode.");
this.#logger.error("No checkpoint support in kubernetes mode.");
return { success: false, reason: "SKIP_RETRYING" };
}

const containerId = this.#logger.debug(
Expand All @@ -484,7 +481,8 @@ class Checkpointer {
);

if (!containerId.stdout) {
throw new Error("could not find container id");
this.#logger.error("could not find container id", { options, containterName });
return { success: false, reason: "SKIP_RETRYING" };
}

const start = performance.now();
Expand Down Expand Up @@ -617,7 +615,7 @@ class TaskCoordinator {
private host = "0.0.0.0"
) {
this.#httpServer = this.#createHttpServer();
this.#checkpointer.initialize();
this.#checkpointer.init();
this.#delayThresholdInMs = this.#getDelayThreshold();

if (process.env.DELAY_THRESHOLD_IN_MS) {
Expand Down Expand Up @@ -1034,7 +1032,7 @@ class TaskCoordinator {
return;
}

const { canCheckpoint, willSimulate } = await this.#checkpointer.initialize();
const { canCheckpoint, willSimulate } = await this.#checkpointer.init();

const willCheckpointAndRestore = canCheckpoint || willSimulate;

Expand Down Expand Up @@ -1131,7 +1129,7 @@ class TaskCoordinator {
return;
}

const { canCheckpoint, willSimulate } = await this.#checkpointer.initialize();
const { canCheckpoint, willSimulate } = await this.#checkpointer.init();

const willCheckpointAndRestore = canCheckpoint || willSimulate;

Expand Down Expand Up @@ -1185,7 +1183,7 @@ class TaskCoordinator {
socket.on("WAIT_FOR_TASK", async (message, callback) => {
logger.log("[WAIT_FOR_TASK]", message);

const { canCheckpoint, willSimulate } = await this.#checkpointer.initialize();
const { canCheckpoint, willSimulate } = await this.#checkpointer.init();

const willCheckpointAndRestore = canCheckpoint || willSimulate;

Expand Down Expand Up @@ -1227,7 +1225,7 @@ class TaskCoordinator {
socket.on("WAIT_FOR_BATCH", async (message, callback) => {
logger.log("[WAIT_FOR_BATCH]", message);

const { canCheckpoint, willSimulate } = await this.#checkpointer.initialize();
const { canCheckpoint, willSimulate } = await this.#checkpointer.init();

const willCheckpointAndRestore = canCheckpoint || willSimulate;

Expand Down
Loading
Loading