Web Scraping Made Easy: Build Your Own Crawler in 30 Minutes | Planet No Code

What you'll learn

Automate Website Intelligence Gathering - Learn how to use Firecrawl's crawl API to systematically scrape multiple pages from any website (like HubSpot's blog) and extract custom insights using AI, such as detecting AI-related content and summarizing key information.
Master Webhook-Based Workflows in No-Code - Discover how to handle asynchronous API responses in Bubble.io by setting up backend workflows that receive webhook notifications when long-running crawl operations complete, ensuring your app scales properly.
Transform API Data into User-Friendly Results - See the complete process of connecting Firecrawl's API to Bubble using the API Connector, processing JSON responses, and displaying scraped web data in organized tables—all without writing traditional code.

Need help with your specific app?

Book a 1‑to‑1 Bubble coaching call with Matt

Book a Coaching Call

Web Scraping and Crawling in Bubble Without Writing Code

Firecrawl is an API that handles web scraping and crawling at scale, returning clean structured data from any website. For Bubble builders, it connects through the API Connector using a cURL-based setup that Firecrawl's playground generates for you. The integration involves two API calls, a backend webhook workflow, and a simple data structure, and it results in a fully functional crawler built inside a Bubble app.

Starting in the Firecrawl Playground

Before touching Bubble, testing the crawl in Firecrawl's playground gives you immediate feedback on what data the API returns and how to configure the extraction. The key parameters include the target URL, a page limit to control cost during testing, and an extract configuration that tells Firecrawl what to look for on each page.

The extract configuration accepts a JSON schema and a prompt. For example, asking Firecrawl to return a Boolean for whether a page mentions AI, plus a text summary of how AI is being used, produces structured per-page output that maps directly to Bubble data fields. Once the playground returns results that look right, the Get Code option in cURL format gives you the exact request structure to paste into Bubble.

Setting Up the Bubble API Connector

Create a new API in the API Connector, name it after the service, and set authentication to Private key in header. The header name is Authorization with the value Bearer [your API key]. Content type does not need to be set manually as application/json is Bubble's default.

Create the crawl call as a POST action (not Data), paste the endpoint URL from the Firecrawl documentation, and paste everything inside the outer quote marks of the cURL data parameter into the body field. Bubble's JSON syntax validation will flag formatting errors, which is useful since JSON is unforgiving of stray commas and missing brackets.

The Webhook Problem: Crawls Take Time

Crawling more than a handful of pages can take minutes, not seconds. An API call that takes minutes will time out in a standard Bubble workflow. The solution is a webhook: instead of waiting for a synchronous response, Firecrawl notifies a Bubble backend workflow endpoint when the crawl is complete.

In the API call body, add a webhook object containing the URL of your Bubble backend workflow endpoint and the event type completed. Bubble's backend endpoint URL is found in the backend workflow's settings, formatted as [app domain]/version-test/api/1.1/wf/[workflow name]. The version segment must be dynamic when the app goes live, so use a text element's home URL value in the workflow to ensure the webhook targets the correct version rather than hardcoding version-test.

Creating the Backend Receiver Workflow

In Bubble's Backend Workflows, create a new API workflow, name it (for example, firecrawl_complete), set it to public, and enable Run without authentication. This allows Firecrawl to call the endpoint as an external service without needing a logged-in session.

Use Request Data > Detect Data to enter a training mode, then trigger a crawl with the webhook pointed at this endpoint's initialize URL. When Firecrawl delivers the webhook notification, Bubble captures the data structure and learns what fields are available, including a crawl ID you can use to fetch the results.

Fetching the Results and Saving to the Database

Once the webhook fires, use the crawl ID from the request data to make a second GET request to Firecrawl's status endpoint. Initializing this call returns the full crawl data, with each page's extracted information available as a list.

Create a data type in Bubble with fields matching your extraction schema (URL, summary, mentions AI as a yes/no field, how AI is used as text). Then use Schedule API workflow on list to process each item in the crawl response. This is Bubble's equivalent of a loop: it calls a single-item workflow once for each element in the list. Inside that single-item workflow, use Create a new thing to save each page's data to the database.

Building the User Interface

The front end needs only an input for the URL to crawl, a button to trigger the workflow, and a table element to display the saved results. The crawl button's workflow calls the initial Firecrawl API action, passes the user's URL and the backend webhook endpoint as JSON-safe dynamic values, and shows a toast notification confirming the request was submitted.

When Firecrawl completes the crawl and the webhook fires, the backend workflow saves the data, and the table updates automatically because it is bound to the database.

Practical Applications

The same architecture supports any use case that requires regular extraction from multiple pages: monitoring competitor blog topics, tracking industry news for AI-related content, building a site map of any domain, or gathering structured product data. The extraction configuration is the only thing that changes between use cases, and the playground makes iteration fast before committing to a Bubble build.

Stop going in circles.

Your waitlist is waiting. Book a coaching call with Matt and get unstuck this week.