URL to HTML#

The URL to HTML Processor is a processor that fetches HTML content from a specified URL. It is a crucial component for web scraping tasks, allowing applications to retrieve and process web page content.

Supported Input Port:

httpurl: The URL to HTML Processor accepts input through the “httpurl” port. The input should be a string representing the URL from which the HTML content needs to be fetched.

Supported Output Port:

html: The processor produces output through the “html” port. The output is the HTML content of the fetched web page.

List of Implementations:#

Request Implementation#

The Request implementation of the URL to HTML Processor uses the Request library to send HTTP requests and fetch HTML content.

Metadata

Sample processor configuration:#

NOTE: Processor is always added to a module(Input or Output). The module is then added to the pipeline.

 {
    "processor_type": "url_to_html",
    "processor_implementation_type": "url_to_html_with_request",
    "input_port": "httpurl",
    "output_port": "html",
    "metadata": {},
}