Skip to main content

Transcribe an audio file to text

info

This example uses the Interval Python SDK. This SDK is in private beta—get in touch if you're interested in trying it out!

Use Interval's handy file upload functionality to quickly build a tool for transcribing an audio file to text. Interval is flexible enough to build on this functionality to fit your team's working style. Index meeting notes, post updates to Slack, or summarize YouTube videos.

Get started with this example
interval.com

How it works

This examples uses Whisper, OpenAI's open source speech recognition model. Install the library and ensure you have its ffmpeg command-line dependancy installed too.

First, we'll leverage the io.input.file I/O method to collect an audio file to transcribe. Using the allowed_extensions option, we'll ensure that the provided file has an audio content.

import os
from interval_sdk import Interval, IO, io_var, ctx_var

interval = Interval(
os.environ.get('INTERVAL_API_KEY'),
)

@interval.action
async def transcribe_audio(io: IO):

file = await io.input.file(
"Upload a file to transcribe", allowed_extensions=[".wav", ".mp3"]
)

return "All done!"

interval.listen();

If you test out this tool with your personal development key, you'll be able to find it in your Development environment. When run, the action provides a friendly uploader for submitting the audio file.

Now that our action has a file to operate on, we can transcribe it using the machine learning model.

We'll add code to load the base version of the Whisper model. This will run once when you boot up your action (and will take some time to download the model the first time it runs).

Then, once a file is provided, we'll pass it to the model to transcribe its text content and display the results.

import os
from interval_sdk import Interval, IO, io_var, ctx_var

import whisper

interval = Interval(
os.environ.get('INTERVAL_API_KEY'),
)

model = whisper.load_model("base")

@interval.action
async def transcribe_audio(io: IO):

file = await io.input.file(
"Upload a file to transcribe", allowed_extensions=[".wav", ".mp3"]
)

with open(file.name, "wb") as f:
f.write(file.read())

result = model.transcribe(file.name)

await io.group(
io.display.heading("Transcription results"),
io.display.markdown(result["text"])
)

return "All done!"

interval.listen();

This works great! But we can improve things a bit.

First, we'll add a loading spinner to run while the model is running, to provide context around what's happening.

Next, our current implementation saves the uploaded file locally to simplify the transcription call, but with a bit more legwork we can avoid this and pass the file's contents directly to the model as bytes. For this to work, we'll use ffmpeg to preprocess the file and numpy to reformat the audio into an array of floats representing the audio waveform.

import os
from typing import cast

from interval_sdk import Interval, IO, ctx_var
from interval_sdk.internal_rpc_schema import ActionContext

import whisper
import numpy as np
import ffmpeg

interval = Interval(
os.environ.get('INTERVAL_API_KEY'),
)

model = whisper.load_model("base")

def audio_from_bytes(inp: bytes, sr: int = 16000):
try:
out, _ = (
ffmpeg.input('pipe:', threads=0)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr)
.run(cmd="ffmpeg", capture_stdout=True, capture_stderr=True, input=inp)
)
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e

return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0

@interval.action
async def transcribe_audio(io: IO):

file = await io.input.file(
"Upload a file to transcribe", allowed_extensions=[".wav", ".mp3"]
)

with open(file.name, "wb") as f:
f.write(file.read())

result = model.transcribe(file.name)
ctx = cast(ActionContext, ctx_var.get())
await ctx.loading.start("Transcribing audio...")

audio = audio_from_bytes(file.read())
result = model.transcribe(audio)

await io.group(
io.display.heading("Transcription results"),
io.display.markdown(result["text"])
)

return "All done!"

interval.listen();

API methods used

Was this section useful?

548 Market St PMB 31323
San Francisco, CA 94104

© 2023

Join our mailing list

Every Friday we send an email with the latest from Interval, including events, product updates, SDK releases, and more.