Audio To Text#

The Audio to Text Processor is a component in the OpenPlugin system that converts audio input into textual output. It plays a crucial role in extracting spoken words from audio files and transforming them into written text.

Supported Input Port

audio: The Audio to Text Processor accepts input through the “audio” port. The input should be an audio file or a reference to an audio file.

Supported Output Port

text: The processor produces output through the “text” port. The output is a string representing the transcribed text extracted from the audio input.

List of Implementations#

Whisper Implementation#

The Whisper implementation of the Audio to Text Processor utilizes OpenAI’s Whisper model to convert audio to text. Whisper is a state-of-the-art speech recognition model that can accurately transcribe spoken words from audio files.

Metadata

Field

Type

Description

openai_api_key

string (required)

The API key for accessing OpenAI’s Whisper ASR system. This key is user-provided.

model_name

string (optional)

The name of the model to be used for the audio to text conversion. The default value is “whisper-1”.

Sample processor configuration:#

NOTE: Processor is always added to a module(Input or Output). The module is then added to the pipeline.

 {
    "processor_type": "file_to_text",
    "processor_implementation_type": "file_to_text_with_langchain",
    "input_port": "filepath",
    "output_port": "text",
    "metadata": {},
}