How to Web Scrape Data From Another Website Into Bubble.io

In this Bubble.io tutorial video we show how to set up an API Connector integration with Page2API so that you can scrape content you've targeting from a website and save to your Bubble.io database.

What you'll learn

Unlock web scraping power: Learn how to integrate Page2API with Bubble's API connector for seamless data extraction!

Master H1 tag scraping: Discover the secrets of targeting and extracting crucial header information from any website.

Boost your Bubble app: Elevate your Bubble app building skills with advanced web scraping techniques and error handling.

Introduction to Web Scraping with Page2API

In this video, I'm going to demonstrate how to web scrape into your Bubble application using the web scraper Page2API. I tried several different web scraper APIs for a recent client project and found that Page2API offers the best integration with the Bubble API connector plugin. That's what I'll be demonstrating to you now.

Setting Up the API Call

Let's head into the Bubble API connector and install this plugin if you haven't already. We'll add another API for Page2API and make a call. For this demonstration, we'll be scraping the h1 tag, which is a common target for web scraping as it's usually the most important header on a web page. We'll call our API "scrape h1".

Understanding API Documentation

To fill out our API call, we need to dig into the Page2API documentation. When looking at API documentation, I find the easiest section to translate into Bubble is the cURL section. We need to make a post call, and in the header, we'll declare the content type as application/JSON.

Configuring the API Call in Bubble

In Bubble, we'll set up the call as follows: 1. Set the call type to POST 2. Add the content type header 3. Copy the API endpoint URL 4. Add the necessary JSON to the body of the call

It's important to change the call type from "Data" to "Action" to enable making this call in a workflow.

Customizing the API Call

We'll edit the JSON in the body to target only the H1 tag and make the URL dynamic. We'll also specify that we want to get back text data instead of HTML. After making these changes, let's test the API call.

Testing the API Call

The web scraping process takes a few seconds to complete. One reason for this is that the scraper uses a real browser, which makes the scraping more reliable and less likely to be blocked by websites.

Implementing Web Scraping in Your Bubble App

Now, let's demonstrate how to add this functionality to your Bubble app design. We'll create a repeating group that shows a list of websites, with an input for the URL and a "Scrape" button.

Creating the Workflow

When the button is clicked, we'll: 1. Call our API 2. Reset the input 3. Add the scraped data to our database

We'll create a new data type called "Website" to store our scraped H1 tags and URLs.

Testing the Web Scraping Functionality

Let's refresh our app and try it out. Remember that web scrapers aren't very intelligent, so you need to provide them with correct URLs. In a real app, you'd want to add error handling and input validation to ensure users enter valid URLs.

Handling Errors and Edge Cases

We tested a few different scenarios, including deliberately incorrect URLs, to see how our app handles errors. It's important to build in error handling and user guidance to make the web scraping process as reliable as possible.

Conclusion

This tutorial demonstrates how to implement web scraping in your Bubble app using Page2API. Remember to consider error handling, input validation, and user experience when implementing this feature in your own applications.