All tools and classes used so far have a large number of options
that make them highly configurable. For instance, there are other
properties of a factory that can be specified—please have a look at the
Javadoc of the document factory you are using. For instance, a common
property is wordreader
, which makes it possible to
specify a different instance of WordReader
—the
class that it used to segment text into words and non-words. The
standard WordReader
(FastBufferedReader
) considers just letters and
digits as part of a word, but you can choose your variant, and even
specify it directly on the command line: for instance,
-pwordreader=FastBufferedReader\(_\)
specifies that
underscores should be considered as part of a word. More generally, you
can specify an expression that follows dsutils
's
ObjectParser
conventions and that will be used to
instantiate a WordReader
.
All MG4J tools implement the standard --help
option, which will display a detailed help text.