There are some incredibly powerful speech to text APIs available that you can link right in with your Bubble app, allowing your users to upload audio files and video files and create a transcript from them. We've already got a video showing you how to use the Whisper API to convert speech to text, but I want this video to be a quick comparison of another service that is AssemblyAI.
No Code transcription API - price per minute
The key points, I would say, is one is price, which is that measured in per minute, the Whisper API is $0.006, whereas the AssemblyAI API is $0.015. So the AssemblyAI APIs is roughly double the price of the Whisper API.
OpenAI's Whisper API limitation - file size
But there are some limitations of Whisper that in a side project I'm building has led me to use AssemblyAI. One of the things that is going to restrict you is that Whisper currently does not accept files larger than 25 megabytes. And that's going to be particularly difficult if you are wanting to transcribe videos, as an HD video is easily going to exceed 25 megabytes. And also, if you're a no code builder like me, you're not going to have the technical skill to incorporate some library that does compression. You'd have to use another external service to compress your audio or video files, and that increases your cost overall.
But Whisper API is very good at accepting some very common formats. And also I have to say it has the edge on speed. When you send a request from the Bubble API connector to Whisper, you get a response back very quickly. And Bubble is actually there waiting for the response. And if we park that, that's actually one of the shortfalls of Whisper compared to AssemblyAI.
AssemblyAI has webhooks
Now, if we look at AssemblyAI and look at the pricing, so they price it per second. But as I say, that works out as $0.015 a minute. So yeah, double the price of the OpenAI Whisper API. But here is where you get the benefit of AssemblyAI. One of them is that although it takes slightly longer to process, you can actually get your response sent to you, or at least notified that your response is ready using a webhook. This means that the Bubble API connector is not waiting for a response to come back from AssemblyAI. So even if it takes five minutes because you've uploaded some huge audio file, your Bubble app can receive that and check that it's ready and then receive it in and process the data when it's ready. You're not left hanging or your users waiting with the loading bar going across the top, and you're not restricted by the fact that as of the time of recording, the Bubble API connector times out in between 50 and 60 seconds. So large files that you send for transcription are not going to work so well through the Bubble API connector using Whisper, and you will probably get them working much more reliably using AssemblyAI.
AssemblyAI has loads of extra features built in
Finally, AssemblyAI comes with loads of extra features based on baked in if we look at the audio intelligence. So these are things which, if you've got a transcript back from Whisper, you could then pass into Chat GPT or GPT-3.5-turbo or GPT 4. And you could ask the text generation service from OpenAI to create a summarisation or sentiment analysis. But it's baked in to AssemblyAI. You can make an API request with an audio file and you can get back a text summary, et cetera. There are other features. Is it going to show on this page here, such as chapter detection, redaction of personal information, topic detection. They've also got the ability to highlight speakers. And so there are a number of transcription services that we use at Planet No Code where it will label different speakers. I believe that that is possible of AssemblyAI.
So there you have it. I just wanted to do a quick summary of basically a process we've been through of being incredibly excited and amazed with how accurate Whisper API is. But then coming across these issues, these things that were restricting what we were trying to build. And then we found AssemblyAI, and I can just say I'm immensely impressed of it. And for the project we're building, it's well worth that extra cost to be able to work with it, I suppose, more leanly and with less errors in Bubble.