AI Web Scraping Product Hunt

In this Bubble tutorial we explore how to use Browserbear AI web scraping to web scrape the Product Hunt homepage.

Ask a question

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all Q&As

If you're looking for a way to web scrape data into your Bubble app, then look no further than BrowserBear. BrowserBear is an exciting addition to the web scraping scene because not only does it have a very easy and powerful API, it also makes use of AI to detect which elements of the page to extract and can even loop through those and then send them all to an endpoint on your Bubble application.

How to learn Bubble.io

So in this Bubble tutorial video, I'm going to show you how you can scrape the front page for Product Hunt and get a list of all of the products into your Bubble app. And before I launch into that, if you're learning Bubble, you're watching this video, it's good to see you. Give us a like and subscribe. But also we have got hundreds of Bubble tutorial videos, many of which you cannot find on our YouTube channel. You can only find them on our website planetnocode.com.

Create a Browserbear task

But let me show you how you can use BrowserBear and just how quick and easy it is. First of all, I'm going to dive into my BrowserBear account. And I'm in fact using one of the templates that the developer BrowserBear has created, and that is using AI web scraping. And I'm going to take you through each of the steps here. So I have a Go action and it's saying go to ProductHunt.com and wait until the network is idle. Then I'm going to save links. And basically this is, effectively I'm guessing behind the scenes, constructing an appropriate prompt for whichever AI BrowserBear is using. And so I'm saying articles, but maybe I should go with products. Let's try products instead. And then save data. So this is a looping action because I don't want it to just save one item on the Product Hunt homepage.

In fact, let's have a look. We've got Glow, we've got CopyPartner, KEM, we've got Monday.com. There's an ad down there. Let's see how it treats that. Yeah, these are just the defaults that it sets out. You could use this to scrape other content like a blog, for example. Let's just hit save on that for now. And then I can test the API through the dashboard here and I can get the results. Now something I've already noticed is I think because it's using the AI aspect, it does take slightly longer to get a response than other web scraping tools that I've used that don't have AI. Of course, the advantage of using the AI is I've not had to open up my browser respect tool, find a class or ID that is declared for each of the elements that I want to extract. This is so much quicker to do in the setup process. So I've got that and basically I'm now going to go into my Bubble app. Here we go.

Browserbear & Bubble API Connector

I've set up BrowserBear in the API Connector because I need to send a command to run what BrowserBear calls a task. So I've got a new API setup here, authorization, I've added in my API key, I've added in the essential shared header, content type, application JSON, and then I've got run task. All of this data, by the way, can be found, I'm simply following along by the structure that they've got here, which is that each of your tasks that you create in BrowserBear has got an ID and you run that and then you pass in parameters, pretty standard API stuff. And the parameter that we need to add in is the webhook URL because we want to say when BrowserBear has finished doing the scrape and it's extracted the data, used the AI, where does it send that data to? And we're going to get onto that. So I've got my task ID. Where do I find my task ID? Well, I find it up here. Copy that. And paste it into that.

Bubble API Endpoint

And then I'm adding in the parameter of webhook URL. Right, how do we set that up in Bubble? Well, I've got another Bubble tab open here and I've added in a back-end workflow of BrowserBear and it is public and it can be run without authentication. Obviously, you need to understand the appropriate risks of doing that, but BrowserBear can't authenticate it with Bubble. And then I'm going to click detect data and it gives me this URL. And I'm going to take this URL and paste it into the URL here. Notice that it's got initialize on the end. And I'm actually going to go back and point two other things out. So it's got initialize. That is basically putting it in learning mode. And I need it to do that so that Bubble can learn the structure of the incoming data. Also, notice that it's got version test in here. And actually this is using a Bubble temporary domain. So just be aware of how this will impact your endpoints if you add your own domain in. The difference between different versions of your Bubble app, such as your developer version and your live version, the webhooks are going to change. So you're going to want to make the webhook URL somehow dynamic in the way that you send it to BrowserBear because you want to swap out different bits of this. But for now, this will work. And I'm going to go back into the API connector and I'm going to say initialize call. I get a response back basically saying that BannerBear is running the task. And then if I just refresh here, I can go into logs and I can see that it's currently processing this scrape. And I'm just making sure that this is ready in initialized mode because I'm expecting in the next 20 to 30 seconds to get that data back. Remember that what I'm scraping from Product Hunt, I can go back into BannerBear and I can just view what's going on here. Let's see what happens. It might take a few more seconds. So I've got the data back from BrowserBear and here we go.

Testing

This is all of the lovely data that I scraped from Product Hunt, including this list of content here. And then I'll go down here. So you can see I've got LinkedIn, I've got AI Designer, I've got free status page. You can see that these are all posts from Product Hunt. So before I wrap this video, I would just point out and remind you that if you actually wanted to run this, you'd need to remove initialize because now it's in the position to actually send data to that endpoint. And of course, make sure that you make your version, this part here, dynamic so that in your live version, it's not sending data to your dev version. If you've liked this video, please like and subscribe on YouTube. That really helps us. And remember that you can find even more Bubble tutorial videos at planetnocode.com.

Latest videos