Interface TermProcessor
-
- All Superinterfaces:
FlyweightPrototype<TermProcessor>
,Serializable
- All Known Implementing Classes:
AbstractSnowballTermProcessor
,DanishStemmer
,DowncaseTermProcessor
,DutchStemmer
,EnglishStemmer
,FinnishStemmer
,FrenchStemmer
,German2Stemmer
,GermanStemmer
,HungarianStemmer
,ItalianStemmer
,KraaijPohlmannStemmer
,LovinsStemmer
,NorwegianStemmer
,NullTermProcessor
,PorterStemmer
,PortugueseStemmer
,SpanishStemmer
,SwedishStemmer
public interface TermProcessor extends Serializable, FlyweightPrototype<TermProcessor>
A term processor, implementing term/prefix transformation and possibly term/prefix filtering.Index contruction requires sometimes modifications of the given terms: downcasing, stemming, and so on. The same transformation must be applied to terms in a query. This interface provides a uniform way to perform arbitrary term transformations.
Index construction requires also term filtering:
processTerm(MutableString)
may return false, indicating that the term should not be processed at all (e.g., because it is a stopword).Additionally, the method
processPrefix(MutableString)
may process analogously a prefix (used for prefix queries).Implementation are encouraged to expose a singleton, when possible, by means of the static factory method
getInstance()
. Note: When merging multiple indices, MG4J checks that all components use the same term processor. Please implement correctly#equals(Object)
. Warning: implementations of this class are not required to be thread-safe, but they provideflyweight copies
. Thecopy()
method is strengthened so to return a instance of this class.This interface was originally suggested by Fabien Campagne.
-
-
Method Summary
Modifier and Type Method Description TermProcessor
copy()
boolean
processPrefix(MutableString prefix)
Processes the given prefix, leaving the result in the same mutable string.boolean
processTerm(MutableString term)
Processes the given term, leaving the result in the same mutable string.
-
-
-
Method Detail
-
processTerm
boolean processTerm(MutableString term)
Processes the given term, leaving the result in the same mutable string.- Parameters:
term
- a mutable string containing the term to be processed, ornull
.- Returns:
- true if the term is not
null
and should be indexed, false otherwise.
-
processPrefix
boolean processPrefix(MutableString prefix)
Processes the given prefix, leaving the result in the same mutable string.This method is not used during the indexing phase, but rather at query time. If the user wants to specify a prefix query, it is sometimes necessary to transform the prefix (e.g., DowncaseTermProcessor.processPrefix(MutableString) downcasing it).
It is of course unlikely that this method returns false, as it is usually not possible to foresee which are the prefixes of indexable words. In case no natural transformation applies, this method should leave its argument unchanged.
- Parameters:
prefix
- a mutable string containing a prefix to be processed, ornull
.- Returns:
- true if the prefix is not
null
and there might be an indexed word starting withprefix
, false otherwise.
-
copy
TermProcessor copy()
- Specified by:
copy
in interfaceFlyweightPrototype<TermProcessor>
-
-