Bubble with Speech to Text using AssemblyAI - Part 2

Headshot of Matt the Planet No Code Bubble Coach

Need 1 to 1 help?

Your no-code consultant & Bubble tutor.

In part 2 of the AssemblyAI and Bubble.io tutorial series we explore how to use webhooks to inform your Bubble app when a transcript has been completed.

Welcome to part two of our miniseries looking at speech to text or speech recognition APIs and how you can easily add them into your Bubble app.

Recap of Part 1

In part one, we began by working through the AssemblyAI API documentation, and we have two calls in our API connector for AssemblyAI. One is to send off our audio file, and then the other one is to once AssemblyAI has finished processing the file, we fetch the results. Remember, this is different to Whisper. You don't get the response back as you wait. You have to check in once the transcript is complete. But you can automate this process using webhooks.

So if we go back into the AssemblyAI documentation, we can look at the section called Using Webhooks. And by looking at this example here, we can see that we can add in a web hook into our JSON. So this is basically saying when we send out, say, our MP3 file that we want transcribed, we say when you have it ready, let's go get this endpoint know that the file is ready. So I'm going to add it in here. In fact, I'm going to add in a space just so that the JSON is a little bit clearer.

Receiving webhook data with backend workflows in Bubble

And then, and my webhook, I'm going to make this dynamic so that it is adaptive to whether I'm using my development or test version of my Bubble app or my live version. So I'm just going to call this web hook URL. So how do I set that up? The inbound webhook. Well, I go on to API and I enable the workflow API and backend workflows. And that then gives me an option up here. And now I create a new API workflow. And I'm going to call this inbound transcript. I have to write it without spaces if I'm going to make it a public API, which I need to have in order for AssemblyAI to be able to send the notification to me. I'm then going to say run without authentication because the Assembly web hook is not going to be able to authenticate. So I just want to do it. You might want to obscure the endpoint. I mean, there is the chance that someone could send data into your app if they were to guess this. It's up to you to judge how risky that is. But for this demonstration, we're going to have it public and have it run without authentication.

Detecting data with backend workflows

And I now need to demo that. So I'm going to change this to detect request, copy this. In actual fact, open this in a new tab. I'm doing this because I need to teach Bubble what the inbound request looks like. And the easiest way to do that is to use detect, but that's only in detection mode while this box is up. I'm then going to go into my APIs and my AssemblyAI API. Right, here we go. So my webhook URL, I paste into there and I'm going to reinitialize. So this is going to upload my tiny audio file again that I'm demoing this process with. And this time it's also supplying the webhook and we can see that the status is queued. I'm then going to go back over here. And because it's a very small file, actually, AssemblyAI has responded incredibly quickly. And so it's told me that the status for this transcript ID is now complete. What do I do next?

Well, I go fetch the transcript ID. And how do I do that? Well, I've already set up the API call. I did this in the earlier video to get a process transcript.

Fetching a completed transcript

And because it is an action, I can go into workflow and I can go plugins, get processed transcript. And instead of it being the test data I used, I can use data request, transcript ID, because that is the data that AssemblyAI are putting into this endpoint. I then send back another request that gets my transcript. And for the sake of making it simple, I'm going to save it as a message and content. This is set up from a previous video where I was demoing chatGPT, but I'm just going to test it using this content and then text. Okay, so that should work. I should be able to upload my file. I should then be able to be notified by AssemblyAI through this webhook endpoint communicating into my Bubble app. I should then send an outbound request for the transcript using the transcript ID, and I'll create a new message with the content of my speech to text.

Let's test it. So I'm going to here again, say reinitialize call. My file is uploaded. Okay, I've made a mistake. I've made the mistake. I need to get rid of initialize here.

Right, reinitialize the call. You can see that it's been queued up. And then I think I can just go into data, app data, messages. And this is all from a previous demo where I was using the GPT-4. But there we go. Yeah, we saw that come in in front of our eyes. So I can then confidently say that every step of my workflow is working.

I think this is where we're going to wrap up part two. In part three, I'm going to demo how to create this on the front end to enable a user to upload a file or maybe even record a file, and then send it off to the AssemblyAI and display the results back to the user.