Text to Audio#

Text to Audio Processor is a processor that converts text input into audio output. It plays a crucial role in transforming written text into spoken words, enabling applications to generate audio content dynamically.

Supported Input Port:

text: The Text to Audio Processor accepts input through the “text” port. The input should be a string representing the text that needs to be converted to audio.

Supported Output Port:

filepath: The processor produces output through the “filepath” port. The output is the file path of the generated audio file.

List of Implementations:#

Azure Implementation#

The Azure implementation of the Text to Audio Processor utilizes Azure’s text-to-speech capabilities to convert text to audio.

Metadata

Field

Type

Description

voice_name

string (optional)

The name of the voice to be used for the audio output. The default value is “en-US-AriaNeural”.

azure_region

string (optional)

The Azure region where the text-to-speech service is located. The default value is “eastus”.

azure_api_key

string (required)

The API key for accessing Azure’s text-to-speech service. This key is user-provided.

output_filename

string (optional)

The name of the output audio file. The default value is “output.mp3”.

output_folder

string (optional)

The folder where the generated audio file will be stored. The default value is “assets”.

Sample processor configuration:#

NOTE: Processor is always added to a module(Input or Output). The module is then added to the pipeline.

 {
    "processor_type": "text_to_audio",
    "processor_implementation_type": "text_to_audio_with_azure",
    "input_port": "text",
    "output_port": "filepath",
    "metadata": {},
}