Ask a question
This Bubble tutorial video is for anyone who isn't a Regex wizard and has been struggling to find a way to extract data from a large portion of text.
In this case, we're imagining that we've passed in an email and all of the email's metadata and content has just come through in a single text field. So how do we go about extracting it? Well, this is a little technique that I developed when I was faced with a similar problem to this a few weeks ago. So I'm just using a multi line text input here so that we can see the data. But you could use this if the data was coming in through an API. So if I do that and then click preview, you'll see that everything is carried across.
Using Split By
I can use the Split By to target specific parts in the data, in our large text data. So I can say from, and I can put space in. And so that's going to split the text around from colon space. And so the first bit of data is going to be blank. But the second bit of data is going to be everything from that point onwards.
So I can go item#2. But that leaves... Okay, let's prove it. Let's just test everything's working. Okay, that's going to provide me with the rest of the text expression. I can then do Split By again. And this time I can do space and I can do the arrow. Basically, what I'm looking for is something that isn't going to change. So that's going to be fixed in every time this piece of text comes through. So from is going to be a label that is going to be fixed every time. And also the formatting of the email address is going to be fixed every time. So if I want to extract the name of the sender, I can do this, which is to have the space, have the arrow, and then this time, it's splitting the message at this point. And so everything before is part one. And then everything after, or at least until we get to another triangle bracket, is one, two, three in our list. So I can just do first item. And there you go. You see, it has reliably extracted the business name or the sender name from the from field. Let's do another example.
Let's say I want to extract the order number. So again, I look for structure that's going to be consistent. So I can do split by, refer to the multi line text input and split by. Okay, and so I can be fairly confident that the hypothetical scenario here is that I want to take details from an order email that's been passed into my Bubble app, and I can assume that the order email subject line is unlikely to change, apart from the beginning bit, apart from the number, and that's the bit I want to extract. So I can go... If I wanted to really be detailed, I'm thinking of, are there possibly any scenarios where what I'm splitting by is going to appear elsewhere? But by making the split by text separator larger, I can narrow down the chance that it's going to not work. So I can go order ID, and I can even do that space and the hash. And then in this instance, I can just go item#2, and there's nothing left at all.
Let's hit preview. Okay, so order number is what I've entered in, and it extracts the order number reliably. Let's make it a little bit more interesting as a final example.
Let's put a space in and then we can say put in a piece of text. Okay, so how now would I extract the order number? Because if I go into preview, it's going to do everything after this. And so that's why I end up with the two lines.
Well, you can split by a new line. So I'm still going to target item two because that's everything after my split by text separator. And then I can split by again. And this time I can do two line spaces. Oh, and I need to choose the first item. So I'm saying when you find two line spaces, as we've got here, go with the first item. So that's everything before the two line spaces. And there we have it.
So there's just a couple of techniques that I found to allow you to extract data from a larger volume of text and take out exactly what you need. As long as you've got those fairly consistent... They need to be 100 % consistent labels or basically locations in the text volume that you're working with in order to be able to target reliably what you're trying to extract.