Ask a question
In this Bubble tutorial video, I demonstrate the initial step of how the AssemblyAI API can be employed to identify different speakers and transcribe their spoken text from any audio file. The process includes using the API to upload an audio file and subsequently get a transcript back, which provides details about the speakers involved and their conversation.
Setting Up The AssemblyAI API in the Bubble API Connector
This tutorial addresses some topics covered in earlier videos where I used the AssemblyAI API. If you need a recap on any of the individual steps, I encourage you to go back and revisit those videos. However, I will be giving a detailed explanation of how I use the Bubble API Connector here.
I've added an API titled AssemblyAI and have included my API key in the authorization field. I'm making a POST request to the AssemblyAI API, and the endpoint is an action, which enables me to run it in a workflow—sending it as JSON.
Transcribing Audio Files with Speaker Labels
I have to provide AssemblyAI with a publicly accessible audio or video file for them to convert into a transcript. I've uploaded an audio file to the Bubble app storage and here is the direct link. The primary step that differentiates this from my earlier AssemblyAI videos is that I've included ‘speaker labels is true’ in the body.
When I initialize this call, API connector returns an ID which serves as the unique identifier for the transcript. Once the AssemblyAI finishes processing the transcript, you can either provide them with a web hook or search for the transcript using this ID.
To simplify, I'll find the transcript using the get process transcript ID in the Bubble API connector. The AssemblyAI documentation explains this in greater detail, but for the purpose of this demonstration, I've condensed it here.
Displaying Transcripts with Speaker Labels
I initialize the call and get back my transcript which clearly demarcates the statements made by different speakers in a conversation. For instance, it begins with 'Hello, my name is Bob, I'm speaker one,' followed by 'Hello, my name is Emma, I'm speaker two.'
The transcript is grouped under 'utterances' indicating the individual portions of the conversation. If I was to extract raw data, it would reveal sections labeled as 'utterance 1' and 'utterance 2,' each corresponding to a speaker.
So, to sum up, part one of this tutorial focuses on gaining a response back that contains data in JSON for identifying different speakers in an automated AI audio transcription.
Upcoming Content
Stay tuned for part two where I will guide you on processing this data through the Bubble database—helping to extract different parts of your conversation and display them effectively.