Skip to content

Commit 13faa69

Browse files
committed
Merge remote-tracking branch 'origin/main' into fix/resume-restore-bugs
2 parents 12ad920 + ef7f112 commit 13faa69

File tree

4 files changed

+266
-3
lines changed

4 files changed

+266
-3
lines changed

docs/context.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ Context (`ctx`) is a way to get information about a run.
99
The context object does not change whilst your code is executing. This means values like `ctx.run.durationMs` will be fixed at the moment the `run()` function is called.
1010
</Note>
1111

12-
Here's an example:
12+
<RequestExample>
1313

14-
```typescript
14+
```typescript Context example
1515
import { task } from "@trigger.dev/sdk/v3";
1616

1717
export const parentTask = task({
@@ -25,6 +25,8 @@ export const parentTask = task({
2525
});
2626
```
2727

28+
</RequestExample>
29+
2830
## Context properties
2931

3032
<ResponseField name="task" type="object">
Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
---
2+
title: "Scrape the top 3 articles from Hacker News and email yourself a summary every weekday"
3+
sidebarTitle: "Scrape Hacker News"
4+
description: "This example demonstrates how to scrape the top 3 articles from Hacker News using BrowserBase and Puppeteer, summarize them with ChatGPT and send a nicely formatted email summary to yourself every weekday using Resend."
5+
---
6+
7+
import LocalDevelopment from "/snippets/local-development-extensions.mdx";
8+
import ScrapingWarning from "/snippets/web-scraping-warning.mdx";
9+
10+
## Overview
11+
12+
In this example we'll be using a number of different tools and features to:
13+
14+
1. Scrape the content of the top 3 articles from Hacker News
15+
2. Summarize each article
16+
3. Email the summaries to yourself
17+
18+
And we'll be using the following tools and features:
19+
20+
- [Schedules](/tasks/scheduled) to run the task every weekday at 9 AM
21+
- [Batch Triggering](/triggering#yourtask-batchtriggerandwait) to run separate child tasks for each article while the parent task waits for them all to complete
22+
- [idempotencyKey](/triggering#idempotencykey) to prevent tasks being triggered multiple times
23+
- [BrowserBase](https://browserbase.com/) to proxy the scraping of the Hacker News articles
24+
- [Puppeteer](https://pptr.dev/) to scrape the articles linked from Hacker News
25+
- [OpenAI](https://platform.openai.com/docs/overview) to summarize the articles
26+
- [Resend](https://resend.com/) to send a nicely formatted email summary
27+
28+
<ScrapingWarning />
29+
30+
## Prerequisites
31+
32+
- A project with [Trigger.dev initialized](/quick-start)
33+
- [Puppeteer](https://pptr.dev/guides/installation) installed on your machine
34+
- A [BrowserBase](https://browserbase.com/) account
35+
- An [OpenAI](https://platform.openai.com/docs/overview) account
36+
- A [Resend](https://resend.com/) account
37+
38+
## Build configuration
39+
40+
First up, add these build settings to your `trigger.config.ts` file:
41+
42+
```tsx trigger.config.ts
43+
import { defineConfig } from "@trigger.dev/sdk/v3";
44+
import { puppeteer } from "@trigger.dev/build/extensions/puppeteer";
45+
46+
export default defineConfig({
47+
project: "<project ref>",
48+
// Your other config settings...
49+
build: {
50+
// This is required to use the Puppeteer library
51+
extensions: [puppeteer()],
52+
},
53+
});
54+
```
55+
56+
Learn more about [build configurations](/config/config-file#build-configuration) including setting default retry settings, customizing the build environment, and more.
57+
58+
### Environment variables
59+
60+
Set the following environment variable in your local `.env` file to run this task locally. And before deploying your task, set them in the [Trigger.dev dashboard](/deploy-environment-variables) or [using the SDK](/deploy-environment-variables#in-your-code):
61+
62+
```bash
63+
BROWSERBASE_API_KEY: "<your BrowserBase API key>"
64+
OPENAI_API_KEY: "<your OpenAI API key>"
65+
RESEND_API_KEY: "<your Resend API key>"
66+
```
67+
68+
### Task code
69+
70+
```ts trigger/scrape-hacker-news.ts
71+
import { render } from "@react-email/render";
72+
import { logger, schedules, task, wait } from "@trigger.dev/sdk/v3";
73+
import { OpenAI } from "openai";
74+
import puppeteer from "puppeteer-core";
75+
import { Resend } from "resend";
76+
import { HNSummaryEmail } from "./summarize-hn-email";
77+
78+
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
79+
const resend = new Resend(process.env.RESEND_API_KEY);
80+
81+
// Parent task (scheduled to run 9AM every weekday)
82+
export const summarizeHackerNews = schedules.task({
83+
id: "summarize-hacker-news",
84+
cron: {
85+
pattern: "0 9 * * 1-5",
86+
timezone: "Europe/London",
87+
}, // Run at 9 AM, Monday to Friday
88+
run: async () => {
89+
// Connect to BrowserBase to proxy the scraping of the Hacker News articles
90+
const browser = await puppeteer.connect({
91+
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
92+
});
93+
logger.info("Connected to Browserbase");
94+
95+
const page = await browser.newPage();
96+
97+
// Navigate to Hacker News and scrape top 3 articles
98+
await page.goto("https://news.ycombinator.com/news", {
99+
waitUntil: "networkidle0",
100+
});
101+
logger.info("Navigated to Hacker News");
102+
103+
const articles = await page.evaluate(() => {
104+
const items = document.querySelectorAll(".athing");
105+
return Array.from(items)
106+
.slice(0, 3)
107+
.map((item) => {
108+
const titleElement = item.querySelector(".titleline > a");
109+
const link = titleElement?.getAttribute("href");
110+
const title = titleElement?.textContent;
111+
return { title, link };
112+
});
113+
});
114+
logger.info("Scraped top 3 articles", { articles });
115+
116+
await browser.close();
117+
await wait.for({ seconds: 5 });
118+
119+
// Use batchTriggerAndWait to process articles
120+
const summaries = await scrapeAndSummarizeArticle
121+
.batchTriggerAndWait(
122+
articles.map((article) => ({
123+
payload: { title: article.title!, link: article.link! },
124+
idempotencyKey: article.link,
125+
}))
126+
)
127+
.then((batch) =>
128+
batch.runs.filter((run) => run.ok).map((run) => run.output)
129+
);
130+
131+
// Send email using Resend
132+
await resend.emails.send({
133+
from: "Hacker News Summary <[email protected]>",
134+
135+
subject: "Your morning HN summary",
136+
html: render(<HNSummaryEmail articles={summaries} />),
137+
});
138+
139+
logger.info("Email sent successfully");
140+
},
141+
});
142+
143+
// Child task for scraping and summarizing individual articles
144+
export const scrapeAndSummarizeArticle = task({
145+
id: "scrape-and-summarize-articles",
146+
retry: {
147+
maxAttempts: 3,
148+
minTimeoutInMs: 5000,
149+
maxTimeoutInMs: 10000,
150+
factor: 2,
151+
randomize: true,
152+
},
153+
run: async ({ title, link }: { title: string; link: string }) => {
154+
logger.info(`Summarizing ${title}`);
155+
156+
const browser = await puppeteer.connect({
157+
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
158+
});
159+
const page = await browser.newPage();
160+
161+
// Prevent all assets from loading, images, stylesheets etc
162+
await page.setRequestInterception(true);
163+
page.on("request", (request) => {
164+
if (
165+
["script", "stylesheet", "image", "media", "font"].includes(
166+
request.resourceType()
167+
)
168+
) {
169+
request.abort();
170+
} else {
171+
request.continue();
172+
}
173+
});
174+
175+
await page.goto(link, { waitUntil: "networkidle0" });
176+
logger.info(`Navigated to article: ${title}`);
177+
178+
// Extract the main content of the article
179+
const content = await page.evaluate(() => {
180+
const articleElement = document.querySelector("article") || document.body;
181+
return articleElement.innerText.trim().slice(0, 1500); // Limit to 1500 characters
182+
});
183+
184+
await browser.close();
185+
186+
logger.info(`Extracted content for article: ${title}`, { content });
187+
188+
// Summarize the content using ChatGPT
189+
const response = await openai.chat.completions.create({
190+
model: "gpt-4o",
191+
messages: [
192+
{
193+
role: "user",
194+
content: `Summarize this article in 2-3 concise sentences:\n\n${content}`,
195+
},
196+
],
197+
});
198+
199+
logger.info(`Generated summary for article: ${title}`);
200+
201+
return {
202+
title,
203+
link,
204+
summary: response.choices[0].message.content,
205+
};
206+
},
207+
});
208+
```
209+
210+
## Create your email template using React Email
211+
212+
To prevent the main example from becoming too cluttered, we'll create a separate file for our email template. It's formatted using [React Email](https://react.email/docs/introduction) components so you'll need to install the package to use it.
213+
214+
Notice how this file is imported into the main task code and passed to Resend to send the email.
215+
216+
```tsx summarize-hn-email.tsx
217+
import {
218+
Html,
219+
Head,
220+
Body,
221+
Container,
222+
Section,
223+
Heading,
224+
Text,
225+
Link,
226+
} from "@react-email/components";
227+
228+
interface Article {
229+
title: string;
230+
link: string;
231+
summary: string | null;
232+
}
233+
234+
export const HNSummaryEmail: React.FC<{ articles: Article[] }> = ({
235+
articles,
236+
}) => (
237+
<Html>
238+
<Head />
239+
<Body style={{ fontFamily: "Arial, sans-serif", padding: "20px" }}>
240+
<Container>
241+
<Heading as="h1">Your Morning HN Summary</Heading>
242+
{articles.map((article, index) => (
243+
<Section key={index} style={{ marginBottom: "20px" }}>
244+
<Heading as="h3">
245+
<Link href={article.link}>{article.title}</Link>
246+
</Heading>
247+
<Text>{article.summary || "No summary available"}</Text>
248+
</Section>
249+
))}
250+
</Container>
251+
</Body>
252+
</Html>
253+
);
254+
```
255+
256+
<LocalDevelopment packages={"the Puppeteer library"} />
257+
258+
## Testing your task
259+
260+
To test this task in the dashboard, use the Test page and set the schedule date to "Now" to ensure the task triggers immediately. Then click "Run test" and wait for the task to complete.

docs/mint.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,7 @@
310310
"guides/examples/open-ai-with-retrying",
311311
"guides/examples/pdf-to-image",
312312
"guides/examples/puppeteer",
313+
"guides/examples/scrape-hacker-news",
313314
"guides/examples/sharp-image-processing",
314315
"guides/examples/supabase-database-operations",
315316
"guides/examples/supabase-storage-upload",
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
<Warning>
2-
**WEB SCRAPING:** When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner's permission using Trigger.dev Cloud is prohibited and will result in account suspension. See [this example](/guides/examples/puppeteer#scrape-content-from-a-web-page) using a proxy.
2+
**WEB SCRAPING:** When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner's permission using Trigger.dev Cloud is prohibited and will result in account suspension. See [this example](/guides/examples/puppeteer#scrape-content-from-a-web-page) which uses a proxy.
33
</Warning>

0 commit comments

Comments
 (0)