Skip to content

Commit c66f98d

Browse files
committed
Added another puppeteer example
1 parent 1d8502e commit c66f98d

File tree

3 files changed

+95
-15
lines changed

3 files changed

+95
-15
lines changed

docs/config/config-file.mdx

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ sidebarTitle: "Configuration"
44
description: "This file is used to configure your project and how it's built."
55
---
66

7+
import ScrapingWarning from "/snippets/web-scraping-warning.mdx";
78
import BundlePackages from "/snippets/bundle-packages.mdx";
89

910
The `trigger.config.ts` file is used to configure your Trigger.dev project. It is a TypeScript file at the root of your project that exports a default configuration object. Here's an example:
@@ -473,6 +474,34 @@ export default defineConfig({
473474
});
474475
```
475476

477+
#### puppeteer
478+
479+
<ScrapingWarning />
480+
481+
To use Puppeteer in your project, add these build settings to your `trigger.config.ts` file:
482+
483+
```ts trigger.config.ts
484+
import { defineConfig } from "@trigger.dev/sdk/v3";
485+
486+
export default defineConfig({
487+
project: "<project ref>",
488+
// Your other config settings...
489+
build: {
490+
extensions: [puppeteer()],
491+
},
492+
});
493+
```
494+
495+
And add the following environment variable in your Trigger.dev dashboard on the Environment Variables page:
496+
497+
```bash
498+
PUPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable",
499+
```
500+
501+
<Note>
502+
Ensure you use `puppeteer` not `puppeteer-core` in your build configuration.
503+
</Note>
504+
476505
#### ffmpeg
477506

478507
You can add the `ffmpeg` build extension to your build process:
@@ -482,7 +511,7 @@ import { defineConfig } from "@trigger.dev/sdk/v3";
482511
import { ffmpeg } from "@trigger.dev/build/extensions/core";
483512

484513
export default defineConfig({
485-
//..other stuff
514+
// Your other config settings...
486515
build: {
487516
extensions: [ffmpeg()],
488517
},

docs/examples/intro.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ description: "Learn how to use Trigger.dev with these practical task examples."
1111
| [OpenAI with retrying](/examples/open-ai-with-retrying) | Create a reusable OpenAI task with custom retry options. |
1212
| [PDF to image](/examples/pdf-to-image) | Use `MuPDF` to turn a PDF into images and save them to Cloudflare R2. |
1313
| [React to PDF](/examples/react-pdf) | Use `react-pdf` to generate a PDF and save it to Cloudflare R2. |
14-
| [Puppeteer](/examples/puppeteer) | Use Puppeteer to generate a PDF or scrape for data. |
14+
| [Puppeteer](/examples/puppeteer) | Use Puppeteer to generate a PDF or scrape a webpage. |
1515
| [Resend email sequence](/examples/resend-email-sequence) | Send a sequence of emails over several days using Resend with Trigger.dev. |
1616
| [Sharp image processing](/examples/sharp-image-processing) | Use Sharp to process an image and save it to Cloudflare R2. |
1717
| [Vercel AI SDK](/examples/vercel-ai-sdk) | Use Vercel AI SDK to generate text using OpenAI. |

docs/examples/puppeteer.mdx

Lines changed: 64 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ import ScrapingWarning from "/snippets/web-scraping-warning.mdx";
99

1010
## Overview
1111

12-
There are 2 example tasks to follow on this page:
12+
There are 3 example tasks to follow on this page:
1313

1414
1. [Basic example](/examples/puppeteer#basic-example)
1515
2. [Generate a PDF from a web page](/examples/puppeteer#generate-a-pdf-from-a-web-page)
16-
3. [Scrape data from a website](/examples/puppeteer#scrape-data-from-a-website)
16+
3. [Scrape content from a web page](/examples/puppeteer#scrape-data-from-a-website)
1717

1818
<ScrapingWarning />
1919

20-
## Adding build configurations
20+
## Build configurations
2121

2222
To use all examples on this page, you'll first need to add these build settings to your `trigger.config.ts` file:
2323

@@ -29,15 +29,11 @@ export default defineConfig({
2929
// Your other config settings...
3030
build: {
3131
// This is required to use the Puppeteer library
32-
external: ["puppeteer"],
32+
extensions: [puppeteer()],
3333
},
3434
});
3535
```
3636

37-
<Note>
38-
Ensure you use `puppeteer` not `puppeteer-core` in your build configuration.
39-
</Note>
40-
4137
## Set an environment variable
4238

4339
Add the following environment variable in your Trigger.dev dashboard on the Environment Variables page:
@@ -109,7 +105,7 @@ export const puppeteerWebpageToPDF = task({
109105
const response = await page.goto("https://google.com");
110106
const url = response?.url() ?? "No URL found";
111107

112-
// Generate PDF from the webpage
108+
// Generate PDF from the web page
113109
const generatePdf = await page.pdf();
114110

115111
logger.info("PDF generated from URL", { url });
@@ -141,22 +137,77 @@ export const puppeteerWebpageToPDF = task({
141137

142138
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard.
143139

144-
## Scrape data from a website
140+
## Scrape content from a web page
145141

146142
### Overview
147143

148-
In this example we use Puppeteer with a proxy to scrape the content from a webpage and log it out.
144+
In this example we use Puppeteer with a BrowserBase proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out.
149145

150146
<ScrapingWarning />
151147

152148
### Task code
153149

154150
```ts trigger/scrape-website.ts
155-
code here
151+
import { logger, task } from "@trigger.dev/sdk/v3";
152+
import puppeteer from "puppeteer-core";
153+
154+
export const puppeteerScrapeWithProxy = task({
155+
id: "puppeteer-scrape-with-proxy",
156+
run: async () => {
157+
const browser = await puppeteer.connect({
158+
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
159+
});
160+
161+
const page = await browser.newPage();
162+
163+
// Set up BrowserBase proxy authentication
164+
await page.authenticate({
165+
username: "api",
166+
password: process.env.BROWSERBASE_API_KEY || "",
167+
});
168+
169+
try {
170+
// Navigate to the target website
171+
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" });
172+
173+
// Scrape the GitHub stars count
174+
const starCount = await page.evaluate(() => {
175+
const starElement = document.querySelector(".github-star-count");
176+
const text = starElement?.textContent ?? "0";
177+
const numberText = text.replace(/[^0-9]/g, "");
178+
return parseInt(numberText);
179+
});
180+
181+
logger.info("GitHub star count", { starCount });
182+
183+
return { starCount };
184+
} catch (error) {
185+
logger.error("Error during scraping", {
186+
error: error instanceof Error ? error.message : String(error),
187+
});
188+
throw error;
189+
} finally {
190+
await browser.close();
191+
}
192+
},
193+
});
156194
```
157195

158196
### Testing your task
159197

160198
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard.
161199

162-
<LocalDevelopment packages={"the Puppeteer library"} />
200+
<LocalDevelopment packages={"the Puppeteer library."} />
201+
202+
## Proxying
203+
204+
If you're using Trigger.dev Cloud and Puppeteer or any other tool to scrape content from websites you don't own, you'll need to proxy your requests. **If you don't you'll risk getting our IP address blocked and we will ban you from our service.**
205+
206+
Here are a list of proxy services we recommend:
207+
208+
- [Browserbase](https://www.browserbase.com/)
209+
- [Brightdata](https://brightdata.com/)
210+
- [Browserless](https://browserless.io/)
211+
- [Oxylabs](https://oxylabs.io/)
212+
- [ScrapingBee](https://scrapingbee.com/)
213+
- [Smartproxy](https://smartproxy.com/)

0 commit comments

Comments
 (0)