Text to Audio#

Text to Audio Processor is a processor that converts text input into audio output. It plays a crucial role in transforming written text into spoken words, enabling applications to generate audio content dynamically.

Supported Input Port:

text: The Text to Audio Processor accepts input through the “text” port. The input should be a string representing the text that needs to be converted to audio.

Supported Output Port:

filepath: The processor produces output through the “filepath” port. The output is the file path of the generated audio file.

List of Implementations:#

Azure Implementation#

The Azure implementation of the Text to Audio Processor utilizes Azure’s text-to-speech capabilities to convert text to audio.

Metadata

Field	Type	Description
voice_name	string (optional)	The name of the voice to be used for the audio output. The default value is “en-US-AriaNeural”.
azure_region	string (optional)	The Azure region where the text-to-speech service is located. The default value is “eastus”.
azure_api_key	string (required)	The API key for accessing Azure’s text-to-speech service. This key is user-provided.
output_filename	string (optional)	The name of the output audio file. The default value is “output.mp3”.
output_folder	string (optional)	The folder where the generated audio file will be stored. The default value is “assets”.

Sample processor configuration:#

NOTE: Processor is always added to a module(Input or Output). The module is then added to the pipeline.

 {
    "processor_type": "text_to_audio",
    "processor_implementation_type": "text_to_audio_with_azure",
    "input_port": "text",
    "output_port": "filepath",
    "metadata": {},
}

Text to Audio

Contents

Text to Audio#

List of Implementations:#

Azure Implementation#

Sample processor configuration:#