Interface LinkExtractor

  • All Known Implementing Classes:
    ContentInternalLinks, IndexLinks, RichTextInternalLinks

    public interface LinkExtractor
    A link extractor is used to fetch all hyperlinks from a JSON content response that point to other parts of the Site API of the same site to continue crawling with.

    Link extractors should ignore external URLs or URLs pointing to assets.

    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      boolean accept​(java.lang.String suffix)
      Returns true if the link extractor accepts the given suffix (processor mapped to this suffix).
      java.util.stream.Stream<java.lang.String> getLinks​(com.jayway.jsonpath.DocumentContext jsonPathContext)
      Retrieves links from the JSON document via JSON path.
    • Method Detail

      • accept

        boolean accept​(java.lang.String suffix)
        Returns true if the link extractor accepts the given suffix (processor mapped to this suffix).
        Parameters:
        suffix - Suffix
        Returns:
        true if JSON response of this processor is supported
      • getLinks

        java.util.stream.Stream<java.lang.String> getLinks​(com.jayway.jsonpath.DocumentContext jsonPathContext)
        Retrieves links from the JSON document via JSON path.
        Parameters:
        jsonPathContext - Document context
        Returns:
        Link URLs