Revolutionize Your Web Scraping: 10,000 Pages from ONE URL with FireCrawl!
Web scraping 10,000 pages from a single URL with FireCrawl! This powerful web scraping tool allows you to scrape multiple web pages into your Bubble.io app.
Unlock the power of web scraping: Learn how to crawl 10,000 pages from a single URL!
Master Bubble.io's API connector and transform your app with dynamic data extraction.
Discover how to leverage backend workflows for efficient data processing and storage in your Bubble app.
Introduction to FireCrawl
FireCrawl is an amazing web scraping tool allowing you to scrape single or even multiple web pages into your Bubble app. In this Bubble tutorial video, I'm going to show you how to use their crawl feature which gives you the ability to provide one page and then for it to go out and crawl the rest of the site or follow any links it can find, find the site map, use that to guide its crawling through the site. But effectively it allows you to take one URL and then take a load of additional content straight from a website. But before I dive into that, if you're watching this video, it's because you've got a business idea and you're trying to build it with no code. And if you want to accelerate that process then click the link down in the description and head over to our website, planetnocode.com.
Setting Up FireCrawl in Bubble
So we're going to be looking at the crawl endpoint here, taking a single URL, getting additional URL's and scraping that data. To do that we need to go into our Bubble app and go into the API connector and you can see other services that I've done demos on Claude eleven labs carbon. A demo of that is coming soon. Really powerful AI tool for vectorizing chunking data. But for now we're just looking at FireCrawl.
Configuring the API Call
I've gone ahead and added in a new, another API. I've named it FireCrawl. I've said private key in header, I've said authorization and then I've written bearer space and put my private key in or my API key in that space and I've just blurred out to save myself having to refresh it. So we then go ahead and we add in a call and so now let's dive into the documentation. So to use the cruel endpoint we have to take the URL from here and we can see that it is a post.
Setting Up the API Action
So change this to post and we'll say cruel website action. So I'm changing it to action so that it can be like a node in a workflow rather than just a data source because I want to say click a button and run this workflow. Let's go back to the documentation and you can see this is where I've got how to layout the authentication. It goes in the header authorization bearer token. We also have this header content type application JSON.
Configuring API Parameters
As of a few months ago that is now the default unless you specify a different value with the Bubble API. So we no longer have to specify that. Now I'm going to paste into here paste into here link. And then just so that we can limit our spend, I'm going to limit the number of pages pages. It returns to three.
Additional API Options
But you can see you get all of these additional criteria that you can specify. And you can also, you can use this to basically just return a list of URL's but you can also use it to return a list of URL's and the pages expressed as markdown. So useful. If you want to be putting this into an AI prompt, you can basically take a small website and you can put that into an AI prompt as data. But also we've got an upcoming video with carbon.
Using Scraped Data with AI
Carbon actually has its own web scraping tool, but you could use it in a similar way. But rather than inserting it into the prompt, you use a chunk to vectorize database to extract the right, just the right data, therefore reducing down your prompt size and then putting that into the AI, the LLM prompt that you're using. So this is the bare minimal of what I need and I'm selecting everything inside of the quote marks here. I'm going to copy that, go back into my Bubble app and paste it in here. I'm going to make this dynamic.
Making the API Call Dynamic
So I'm highlighting the speech marks because I'm going to make it JSON safe. And JSON safe puts the speech marks back in place. So I'll just call this URL. You could of course have a larger crawl limit. So let's put the speech marks back in place.
Considerations for Web Scraping
Paste in BBC. Now we do have to consider this web crawling web scraping is not 100% reliable. That's because many websites put defenses in place to stop you scraping their content. I mean, there's a whole legal battle about OpenAI and effectively scraping the whole web, including many paid journalistic sources. So if this doesn't work, that's probably the reason.
Testing the API Call
So let's initialize the call. It's also a heads up that in my experience, a web scraper may work one day and may not work another day. So it is really worth considering how you handle errors with it. But it looks like something is happening. Good.
Handling the API Response
Okay, this has worked. There are no errors. We get back a job id and that's all we get back. And that's simply because unlike say, scraping content from one web page, this is a job which may take longer than is reasonable to expect for a, for like the API connector to wait. So we get a job id and we then need to check this for the status.
Setting Up Job Status Check
So I'm going to copy the job id and click save that's good. This has worked. It's trained Bubble what to expect from this particular API request. So let's dive back into the documentation because we now need to move on to check job status. So once more we can copy this and we can see that this is now a get request.
Configuring Job Status Check
So let's go back into the Bubble API connector, paste it in, and I just need to update the brackets into square brackets because that's what Bubble asks for, for a parameter. Because we want to, we are to dynamically insert in the right job id. It's not going to be private. We need access to this for this in a workflow. So now let me paste in that job id and I'll say get crawl status.
Reviewing Crawl Results
Now I've only asked it to crawl three pages. I wouldn't be surprised if this is done. Yep, here we go. We get back three results and we get tons of data. I mean, just look at the scroll bar there of what it scraped from BBC and it says three pages each with content, each of content as markdown. We've got descriptions, we've got links on page. There's a wealth of data here. So one final step is actually, before I go on to that, there is no webhook here. So you can't have your app notified when the crawl status is complete. So what I would probably do is when in fact let me build, let me build it out roughly how I would do this.
Building the Workflow
So let's get dive into Bubble app and I've got a really simple bit of UI designed here. And so I'm going to say add and then I'm going to search for FireCrawl and I see it listed here because this is exactly what I've called it here. So if you see something different, it's either because you've named it slightly differently for your app, it may also be because you've not successfully initialized the calls. If you don't do that, they won't show up. So let's take our input value oh, and make it JSON safe.
Setting Up Backend Workflow
Okay, I'm now going to set up a backend workflow because I want to be able to check crawl status and I'm doing a backend workflow because I don't want my user to have to stay on the page and like it check every 5 seconds or something like that. Because if they leave the page that work on, page workflows aren't going to run. A backend workflow can run on the server regardless of what my user is doing. So it doesn't need to be public. It just needs to have one parameter, which is job id.
Configuring Workflow Steps
And then let me just check in the API connector. Did I change this to I'm going to change this to. I'm going to change this to action. That just means that it's a node, it's a block on a workflow. I think it makes slightly more sense here.
Setting Up Workflow Logic
So I'll say get status, no, what I call that crawl status. There we go. And instead of the job id here I want to dynamically fill it with the job id that goes into the top level of the workflow and then this is going to return.
Handling Workflow Results
In fact, let's just dive back into documentation. If the job is not complete, response includes content within partial data. So there we go. So in the response string here we need it to say completed. So what I can do here is I can schedule the workflow back on itself and I could say current date time, I could just say like plus a minute pass in the same job id, but only when result of step one status is, and I'm going to double check because this needs to be exact match.
Implementing Workflow Loop
I'm going to copy across completed.
Completed. No no no, it doesn't need to be that. It is active, is active. That means that it's running and in fact I'll have a terminate workflow with the same condition. So this just means that it's going to loop upon itself.
Considerations for Workflow Looping
Now you could probably put in additional checks here because you just wouldn't want to create an infinite loop. Maybe you could have some sort of counter as it goes around. Yet just be aware, especially with the introduction of workload units, that when you loop a backend workflow on itself like this, that you have some sort of protection in place to ensure that it doesn't infinitely happen because you introduced an error, for example. Yeah, I don't know, maybe this only when statement could cause issues. I'm kind of just putting the key principles in place here.
Saving Scraped Data
We say terminate it when it is active because the final thing we would want to do is create a new thing. And I probably got, yeah, I've got web scraping as a data type from a different tutorial, but I'm just going to pull out, oh actually I get back a list of pages. Okay. So if I wanted to save the first page that came back, I could say the result of here data, first item title. Okay, so that's saving the title.
Handling Multiple Pages
But remember I'm getting back three web websites. So actually I need another backend workflow and I'll call this one save web page and I'm just going to pass in title and content markdown. I like working with Markdown. I think Markdown is really helpful for using web AI's because it does add a bit more structure to it. So here we have the create a new thing web scrape.
Creating Database Entries
And so I'd say title equals, so it's the title that I pass into this backend workflow. We'll add a new field, we'll call it markdown. If you have issues displaying markdown on your Bubble app, you can find plugins that convert into HTML so that type of data is text.
Setting Up List Processing
Okay, so now I schedule a backend workflow on a list. The thing that I'm listing through is I have to go back to the API connector. So I get crawl status. Now this is just a little bit confusing. It's a little bit annoying when Bubble creates really long, you can't rename this.
Finalizing Workflow Configuration
So I'm looking for get crawl status data because that's the list element that is, each website is one item in crawl status data. Let's go back. So crawl status data, I think that is, and then the list, this will tell me if it's matched up. Yeah. Okay.
Because that's gone blue. I know that it's successfully, the data in and the type of data it's expecting have successfully matched. So now I can run my safe web page. And I'm referring to when I say this, get cruel status data. I'm now saying this is a single item in the list.
So we'll go for the title and we'll go for the markdown and we'll just run this on straight away and we'll leave the interval clear. Okay, so to go back, let's go back to our page.
Initiating the Crawl Process
Last thing to do is to schedule that initial check. So we say check crawl status. The job id is the result of what comes out of our initial request to FireCrawl. And then we'll say current date time and you could add 30 seconds here. It really depends on how many pages you're scraping.
Workflow Overview
And you can use the playground just in your FireCrawl account to get an idea of what websites it scrapes really well and how long it takes to do that. So this is effectively saying when this button is clicked, send it to the API that we set up and take the job id and pass that into a backend workflow that will run in 30 seconds. So then we go to backend workflows and we'll check it. And if the status is active that means it isn't completed and so it will say well, we'll run this workflow again in a minute's time and we'll terminate it that way. We're not running this on incomplete data.
Processing Scraped Data
And then for each, so when this goes through and it is completed, we're saying for each item, each web page that is returned, we will run a save. And we're using a back end workflow, a schedule API workflow on list here because we want to be flexible with the number of pages. If you knew that you were only going to be scraping three pages, you could add in three, create new things here. But I like to make things dynamic, flexible. So we could run this on 100 pages.
Considerations for Large-Scale Scraping
Just be a little bit careful of the build that you rack up with FireCrawl. I mean it's really cheap, it's really good value, but do be aware of how easily and quickly that can escalate. Especially test to see if you're getting the right sort of data back before you run it on something 100 or even 1000 times. I mean we've been building an internal project at planet no code and I've used FireCrawl to scrape 1000 pages. It took about ten minutes.
Using AI for Data Processing
They all got saved into the Bubble database. I then used OpenAI or Claude to clean up the markdown and save it, to just kind of make it reduce like extract the key data from this page. And if you do want to extract data from each of these web pages, I'd really recommend using Claude because Claude's tool function allows you to extract structured data. We've got videos on that. In fact, if you were to just search for Claude secret JSON mode, we've got a video on that.
Conclusion
So if you've got any questions, please leave your comments down below and I'll see you in the next one.
Can't find what you're looking for?
Search our 300+ Bubble tutorial videos. Start learning no code today!
Flexible Pricing Plans to Fit Your No-Code Journey
Choose the plan that aligns with your goals and start building your startup today.
Have questions?
We have answers!
Find answers to common questions about our membership plans, programs, and more.
Both plans offer full access to our learning resources, community, and support. The Annual plan provides a significant discount (over 15%) compared to paying monthly, and it allows you to lock in your rate for a full year.
Absolutely! You can easily upgrade or downgrade your membership plan at any time by logging into your account and selecting the desired plan. Any unused portion of your current plan will be prorated and applied to your new plan.
As a Planet No Code member, you'll receive a discount on our Bubble coaching sessions. Monthly members receive a 10% discount, while Annual members receive a 17.5% discount. To redeem your discount, simply log into your account and book a coaching session through our platform.
Our 8-week intensive mentorship program is designed to provide personalized guidance and support to help you accelerate your startup journey. You'll be matched with a startup expert who will work with you one-on-one to set goals, overcome challenges, and make rapid progress.
To apply for the Mastery Program, simply click the "Request Invitation" button on our pricing page and fill out the application form. Our team will review your application and schedule a call with you to discuss your goals and determine if the program is a good fit for your needs.
We accept all major credit cards, including Visa, Mastercard, American Express, and Discover.
While we don't offer a free trial, we do provide a 14-day money-back guarantee. If you're not completely satisfied with your membership within the first 14 days, simply contact our support team, and we'll issue a full refund.
If you decide that Planet No Code isn't the right fit for you, you can cancel your membership at any time by logging into your account and navigating to the subscription management page. Click the "Cancel Membership" button, and your membership will be terminated at the end of your current billing cycle.