Class AnchorExtractor

  • All Implemented Interfaces:
    Callback

    public class AnchorExtractor
    extends DefaultCallback
    A callback extracting anchor text. When instantiating the extractor, you can specify the number of characters to be considered before the anchor, after the anchor or during the anchor (just the first characters are taken into consideration in the last two characters, and just the last ones in the first case).

    At the end of parsing, the result (the list of anchors) is available in anchors, whose elements provide the content of the href attribute the text of the anchor and around the anchor; text is however modified so that fragment of words at the beginning of the pre-anchor context, or at the end of the post-anchor context, are cut away.

    For example, a fragment like: ...foo fOO FOO FOO ANCHOR TEXT BAR BAR BAr bar... (where the uppercase part represents the pre- and post-anchor context) generates the element Anchor("xxx", "FOO FOO ANCHOR TEXT BAR BAR")

    • Constructor Detail

      • AnchorExtractor

        public AnchorExtractor​(int maxPreAnchor,
                               int maxAnchor,
                               int maxPostAnchor)
        Creates a new anchor extractor.
        Parameters:
        maxPreAnchor - maximum number of characters before an anchor.
        maxAnchor - maximum number of characters in an anchor.
        maxPostAnchor - maximum number of characters after an anchor.
      • AnchorExtractor

        public AnchorExtractor​(int maxPreAnchor,
                               int maxAnchor,
                               int maxPostAnchor,
                               String delimiter)
        Creates a new anchor extractor.
        Parameters:
        maxPreAnchor - maximum number of characters before an anchor.
        maxAnchor - maximum number of characters in an anchor.
        maxPostAnchor - maximum number of characters after an anchor.
        delimiter - a token that will be inserted to delimit the anchor text, or null for no delimiter.