Interface LinkExtractor

All Known Implementing Classes:
ContentInternalLinks, IndexLinks, RichTextInternalLinks

public interface LinkExtractor
A link extractor is used to fetch all hyperlinks from a JSON content response that point to other parts of the Site API of the same site to continue crawling with.

Link extractors should ignore external URLs or URLs pointing to assets.

  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
    accept(String suffix)
    Returns true if the link extractor accepts the given suffix (processor mapped to this suffix).
    getLinks(com.jayway.jsonpath.DocumentContext jsonPathContext)
    Retrieves links from the JSON document via JSON path.
  • Method Details

    • accept

      boolean accept(String suffix)
      Returns true if the link extractor accepts the given suffix (processor mapped to this suffix).
      Parameters:
      suffix - Suffix
      Returns:
      true if JSON response of this processor is supported
    • getLinks

      Stream<String> getLinks(com.jayway.jsonpath.DocumentContext jsonPathContext)
      Retrieves links from the JSON document via JSON path.
      Parameters:
      jsonPathContext - Document context
      Returns:
      Link URLs