Serialized Form
-
Package it.unimi.di.big.mg4j.document
-
Class it.unimi.di.big.mg4j.document.AbstractDocumentFactory extends Object implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.CompositeDocumentFactory extends AbstractDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
documentFactory
DocumentFactory[] documentFactory
The array of document factories composing this composite document factory. -
factoryIndex
int[] factoryIndex
The factory of each field. -
field2Index
Object2IntOpenHashMap<String> field2Index
The map from field names to field indices. -
fieldName
String[] fieldName
The name of all fields in sequence. -
fieldType
DocumentFactory.FieldType[] fieldType
The type of all fields in sequence. -
numberOfFields
int numberOfFields
The overall number of fields (i.e., the sum ofDocumentFactory.numberOfFields()
overCompositeDocumentFactory.documentFactory
. -
originalFieldIndex
int[] originalFieldIndex
The index of each field in its own factory.
-
-
Class it.unimi.di.big.mg4j.document.ConcatenatedDocumentCollection extends AbstractDocumentCollection implements Serializable
- serialVersionUID:
- 1L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
-
Serialized Fields
-
collectionName
String[] collectionName
The name of the collections composing this concatenated document collection. -
n
int n
The length ofConcatenatedDocumentCollection.collection
. -
startDocument
long[] startDocument
The array of starting documents (the last element is the overall number of documents).
-
-
Class it.unimi.di.big.mg4j.document.CSVDocumentCollection extends AbstractDocumentSequence implements Serializable
- serialVersionUID:
- 1L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
-
Serialized Fields
-
column
String[] column
The column names. -
factory
DocumentFactory factory
The factory to be used by this collection. -
fileName
String fileName
The CSV filename. -
separator
String separator
The field separator. -
titleColumn
int titleColumn
If nonnegative, the index of the colulmn to be used as a title.
-
-
Class it.unimi.di.big.mg4j.document.DispatchingDocumentFactory extends PropertyBasedDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
dispatchingKey
Enum<?> dispatchingKey
If aDispatchingDocumentFactory.StringBasedDispatchingStrategy
should be used, this field represents the property key to be checked. Otherwise, this isnull
. -
documentFactory
DocumentFactory[] documentFactory
The subfactories used. -
fieldName
String[] fieldName
The names of the fields. -
fieldType
DocumentFactory.FieldType[] fieldType
The types of the fields. -
n
int n
The number of subfactories used. -
nullReader
WordReader nullReader
A word reader that is returned when a null field should be returned. -
numberOfFields
int numberOfFields
The number of fields of this factory. -
rename
int[][] rename
The array specifying how subfactory fields should be mapped into fields of this factory. More precisely,rename[f][k]
specifies which field of factorydocumentFactory[f]
should be used to return the field namedfieldName[k]
: it is assumed that the type of the field in the subfactory is correct (i.e., thatdocumentFactory[f].fieldType(k)==fieldType[k]
). The value -1 is used to return an empty textual field (i.e., a word reader on an empty string). -
strategy
DispatchingDocumentFactory.DispatchingStrategy strategy
The strategy to be used. -
value2factoryClass
Object2ObjectLinkedOpenHashMap<String,Class<? extends DocumentFactory>> value2factoryClass
If aDispatchingDocumentFactory.StringBasedDispatchingStrategy
should be used, this field represents the map from values to factories.
-
-
Class it.unimi.di.big.mg4j.document.DispatchingDocumentFactory.StringBasedDispatchingStrategy extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
key
Enum<?> key
The key to be resolved. -
value
Object2IntMap<String> value
The values that should be used for comparisons.
-
-
Class it.unimi.di.big.mg4j.document.FileSetDocumentCollection extends AbstractDocumentCollection implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
factory
DocumentFactory factory
The factory to be used by this collection. -
file
String[] file
The files in this collection. -
last
InputStream last
The last returned file input stream. -
uri
String[] uri
URIs for each file in this collection, ornull
, in which case the filename will be used as URI.
-
-
Class it.unimi.di.big.mg4j.document.HtmlDocumentFactory extends PropertyBasedDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
-
Serialized Fields
-
delimiter
String delimiter
A token that will be inserted to delimit the anchor text, ornull
for no delimiter. -
maxAnchor
int maxAnchor
The maximum number of characters in an anchor. -
maxPostAnchor
int maxPostAnchor
The maximum number of characters after an anchor. -
maxPreAnchor
int maxPreAnchor
The maximum number of characters before an anchor.
-
-
Class it.unimi.di.big.mg4j.document.IdentityDocumentFactory extends PropertyBasedDocumentFactory implements Serializable
- serialVersionUID:
- 2L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
-
Serialized Fields
-
fieldName
String fieldName
The name of the only field.
-
-
Class it.unimi.di.big.mg4j.document.JavamailDocumentCollection extends AbstractDocumentCollection implements Serializable
- serialVersionUID:
- 2L
-
Serialization Methods
-
readResolve
private Object readResolve() throws javax.mail.MessagingException, IOException
- Throws:
javax.mail.MessagingException
IOException
-
-
Serialized Fields
-
Class it.unimi.di.big.mg4j.document.JdbcDocumentCollection extends AbstractDocumentCollection implements Serializable
- serialVersionUID:
- 1L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
-
Serialized Fields
-
dbUri
String dbUri
The URI pointing at the database. -
doc2id
int[] doc2id
The map (as an array) from documents to database identifiers. -
factory
DocumentFactory factory
The factory to be used by this collection. -
id2doc
Int2IntMap id2doc
The map from database identifiers to documents. -
idSpec
String idSpec
The spec for the id field; by default it is id, but in complex query it could be ambiguous. -
jdbcDriverName
String jdbcDriverName
Optionally, the driver name. -
select
String select
The query generating the collection (without the SELECT keyword). -
where
String where
The WHERE part of the query generating the collection (without the WHERE keyword), ornull
.
-
-
Class it.unimi.di.big.mg4j.document.PropertyBasedDocumentFactory extends AbstractDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
defaultMetadata
Reference2ObjectMap<Enum<?>,Object> defaultMetadata
The set of default metadata for this factory. It is initalised byPropertyBasedDocumentFactory.parseProperties(Properties)
.
-
-
Class it.unimi.di.big.mg4j.document.ReplicatedDocumentFactory extends AbstractDocumentFactory implements Serializable
- serialVersionUID:
- 2L
-
Serialized Fields
-
documentFactory
DocumentFactory documentFactory
The document factory that will be replicated. -
field2Index
Object2IntOpenHashMap<String> field2Index
The map from field names to field indices. -
fieldName
String[] fieldName
The field names. -
numberOfCopies
int numberOfCopies
The number of copies.
-
-
Class it.unimi.di.big.mg4j.document.SimpleCompressedDocumentCollection extends AbstractDocumentCollection implements Serializable
- serialVersionUID:
- 1L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
-
Serialized Fields
-
basename
String basename
The basename of this collection. -
documents
long documents
The number of documents in this collection. -
exact
boolean exact
Whether this collection is exact (i.e., whether it stores nonwords). -
factory
DocumentFactory factory
The underlying factory. -
fileMappingOk
boolean fileMappingOk
True if memory mappings have been all been obtained. -
fileOpenOk
boolean fileOpenOk
True if ancillary files have been all correctly opened. -
hasNonText
boolean hasNonText
Whether this collection contains non-text or virtual fields. -
nonTerms
long nonTerms
The number of nonterms in this collection, or -1 ifSimpleCompressedDocumentCollection.exact
is false. -
terms
long terms
The number of terms in this collection.
-
-
Class it.unimi.di.big.mg4j.document.SubDocumentFactory extends AbstractDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
field2Pos
Int2IntOpenHashMap field2Pos
A map from the original field index to the new index; returns -1 for non-mapped fields. -
underlyingFactory
DocumentFactory underlyingFactory
The underlying document factory. -
visibleField
int[] visibleField
The subfields ofSubDocumentFactory.underlyingFactory
that will be exposed.
-
-
Class it.unimi.di.big.mg4j.document.SubsetDocumentSequence extends AbstractDocumentSequence implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
documents
LongSet documents
The set of document pointers to be retained. -
underlyingSequence
DocumentSequence underlyingSequence
The underlying document sequence.
-
-
Class it.unimi.di.big.mg4j.document.TRECDocumentCollection extends AbstractDocumentCollection implements Serializable
- serialVersionUID:
- -4251461013312968454L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
writeObject
private void writeObject(ObjectOutputStream s) throws IOException
- Throws:
IOException
-
-
Serialized Fields
-
buffer
byte[] buffer
-
bufferSize
int bufferSize
The buffer size. -
factory
DocumentFactory factory
The document factory. -
file
String[] file
The list of the files containing the documents. -
lastStream
SegmentedInputStream lastStream
The last returned stream. -
useGzip
boolean useGzip
Whether the files inTRECDocumentCollection.file
are gzipped.
-
-
Class it.unimi.di.big.mg4j.document.TRECHeaderDocumentFactory extends AbstractDocumentFactory implements Serializable
- serialVersionUID:
- -8671564750345493607L
-
Serialized Fields
-
buffer
byte[] buffer
-
-
Class it.unimi.di.big.mg4j.document.WarcDocumentSequence extends AbstractDocumentSequence implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
bufferSize
int bufferSize
The buffer size used for reads. -
factory
DocumentFactory factory
The user specified factory. -
useGzip
boolean useGzip
Whether the Warcfile are gzipped. -
warcFile
String[] warcFile
The list of WARC files
-
-
Class it.unimi.di.big.mg4j.document.WikipediaDocumentCollection extends AbstractDocumentCollection implements Serializable
- serialVersionUID:
- 1L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
-
Serialized Fields
-
factory
DocumentFactory factory
The factory to be used by this collection. -
file
String[] file
The files in this collection. -
firstDocument
long[] firstDocument
An array parallel toWikipediaDocumentCollection.file
containing the index of the first document within each file, plus a final entry equal toWikipediaDocumentCollection.size
. -
gzipped
boolean gzipped
The files inWikipediaDocumentCollection.file
are gzip'd. -
phrase
boolean phrase
Whether this index contains phrases (as opposed to documents). -
pointers
ObjectArrayList<EliasFanoMonotoneLongBigList> pointers
A list of lists of pointers parallel toWikipediaDocumentCollection.file
. Each list contains the starting pointer of each document (within its file), plus a final pointer at the end of the file. -
size
int size
The number of documents in this collection.
-
-
Class it.unimi.di.big.mg4j.document.WikipediaDocumentCollection.WhitespaceWordReader extends FastBufferedReader implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.WikipediaDocumentSequence extends AbstractDocumentSequence implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
baseURL
String baseURL
The base URL for pages (e.g.,http://en.wikipedi.org/wiki/
). -
bzipped
boolean bzipped
Whether the input is compressed withbzip2
. -
factory
DocumentFactory factory
The prototypeCompositeDocumentFactory
used to parse Wikipedia pages. -
imageBaseURL
String imageBaseURL
WikipediaDocumentSequence.baseURL
concatenated with${image}
. -
keepNamespaced
boolean keepNamespaced
Whether to keep in the index namespace pages. -
linkBaseURL
String linkBaseURL
WikipediaDocumentSequence.baseURL
concatenated with${title}
. -
nameSpaces
com.google.common.collect.ImmutableSet<MutableString> nameSpaces
The set of namespaces specified inWikipediaDocumentSequence.wikipediaXmlDump
. -
parseText
boolean parseText
Whether to parse text (e.g., we do not parse text when computing titles/URIs). -
redirectAnchors
ObjectArrayList<AnchorExtractor.Anchor> redirectAnchors
This list (whose access must be synchronized) accumulates virtual text (anchors) generated by redirects. It is filled when meeting redirect pages, and it is emptied at the first non-redirect page (the page in which the list is emptied is immaterial). Note that because of this setup, if there are some redirect pages that are not followed by any indexed page the anchors of those redirects won't be processed at all. If this is a problem, just add a fake empty page at the end. -
wikiModel
it.unimi.di.big.mg4j.document.WikipediaDocumentSequence.MyWikiModel wikiModel
The Bliki model used to parse pages. -
wikipediaXmlDump
String wikipediaXmlDump
The Wikipedia XML dump.
-
-
Class it.unimi.di.big.mg4j.document.WikipediaDocumentSequence.SignedRedirectedStringMap extends AbstractObject2LongFunction<CharSequence> implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
numberOfDocuments
long numberOfDocuments
The number of documents. -
signedFunction
Object2LongFunction<CharSequence> signedFunction
A signed function function mapping valid keys to their ordinal position. -
target
long[] target
The value to be returned for keys whose ordinal position is greater thanWikipediaDocumentSequence.SignedRedirectedStringMap.numberOfDocuments
.
-
-
Class it.unimi.di.big.mg4j.document.WikipediaDocumentSequence.WikipediaHeaderFactory extends AbstractDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
wordReader
WordReader wordReader
-
-
Class it.unimi.di.big.mg4j.document.ZipDocumentCollection extends AbstractDocumentCollection implements Serializable
- serialVersionUID:
- 2L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
-
Serialized Fields
-
exact
boolean exact
true
iff this is an exact reproduction of the original sequence (i.e., if also non-words are preserved). -
factory
DocumentFactory factory
The factory used for this document collection. -
numberOfDocuments
long numberOfDocuments
The number of documents. -
underlyingFactory
DocumentFactory underlyingFactory
The factory used for the original document sequence. -
zipFilename
String zipFilename
The name of the zip collection file.
-
-
Class it.unimi.di.big.mg4j.document.ZipDocumentCollection.ZipFactory extends AbstractDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
exact
boolean exact
-
underlyingFactory
DocumentFactory underlyingFactory
-
-
-
Package it.unimi.di.big.mg4j.document.tika
-
Class it.unimi.di.big.mg4j.document.tika.AbstractSimpleTikaDocumentFactory extends AbstractTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
fields
List<TikaField> fields
The list of tika fields. -
wordReader
WordReader wordReader
The word reader used by this class.
-
-
Class it.unimi.di.big.mg4j.document.tika.AbstractTikaDocumentFactory extends PropertyBasedDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.AutoDetectDocumentFactory extends AbstractSimpleTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.EPUBDocumentFactory extends AbstractSimpleTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.GreedyTikaField extends TikaField implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.HtmlDocumentFactory extends AbstractSimpleTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.MSOfficeDocumentFactory extends AbstractSimpleTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.OOXMLDocumentFactory extends AbstractSimpleTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.OpenDocumentDocumentFactory extends AbstractSimpleTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.PdfDocumentFactory extends AbstractSimpleTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.RTFDocumentFactory extends AbstractSimpleTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.TextDocumentFactory extends AbstractSimpleTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.TikaField extends Object implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.document.tika.XMLDocumentFactory extends AbstractSimpleTikaDocumentFactory implements Serializable
- serialVersionUID:
- 1L
-
-
Package it.unimi.di.big.mg4j.index
-
Class it.unimi.di.big.mg4j.index.BitStreamHPIndex extends BitStreamIndex implements Serializable
- serialVersionUID:
- 0L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
-
Class it.unimi.di.big.mg4j.index.BitStreamIndex extends Index implements Serializable
- serialVersionUID:
- 0L
-
Serialization Methods
-
readObject
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException
- Throws:
IOException
ClassNotFoundException
-
-
Serialized Fields
-
bufferSize
int bufferSize
The size of the buffer used to read the bit stream. -
countCoding
CompressionFlags.Coding countCoding
The coding for counts. SeeCompressionFlags
. -
frequencyCoding
CompressionFlags.Coding frequencyCoding
The coding for frequencies. SeeCompressionFlags
. -
height
int height
The parameterh
(the maximum height of a skip tower), or -1 if this index has no skips. -
offsets
LongBigList offsets
The offset of each term, if offsets were loaded or specified at creation time, ornull
. -
pointerCoding
CompressionFlags.Coding pointerCoding
The coding for pointers. SeeCompressionFlags
. -
positionCoding
CompressionFlags.Coding positionCoding
The coding for positions. SeeCompressionFlags
. -
quantum
int quantum
The quantum, or -1 if this index has no skips, or 0 if this is aBitStreamHPIndex
and quanta are variable.
-
-
Class it.unimi.di.big.mg4j.index.DowncaseTermProcessor extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialization Methods
-
readResolve
private Object readResolve()
-
-
Class it.unimi.di.big.mg4j.index.FileHPIndex extends BitStreamHPIndex implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
Class it.unimi.di.big.mg4j.index.FileIndex extends BitStreamIndex implements Serializable
- serialVersionUID:
- 0L
-
Class it.unimi.di.big.mg4j.index.Index extends Object implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
field
String field
The field indexed by this index, ornull
. -
hasCounts
boolean hasCounts
Whether this index contains counts. -
hasPayloads
boolean hasPayloads
Whether this index contains payloads; if true,Index.payload
is non-null
. -
hasPositions
boolean hasPositions
Whether this index contains positions. -
keyIndex
Index keyIndex
The index used as a key to retrieve intervals. Usually equal tothis
, but it is settable. -
maxCount
int maxCount
The maximum number of positions in an position list, or possibly -1 if this index does not have positions. -
numberOfDocuments
long numberOfDocuments
The number of documents of the collection. -
numberOfOccurrences
long numberOfOccurrences
The number of occurrences of the collection, or possibly -1 if it is unknown. -
numberOfPostings
long numberOfPostings
The number of postings (pairs term/document) of the collection. -
numberOfTerms
long numberOfTerms
The number of terms of the collection. This field might be set to -1 in some cases (for instance, in certain documental clusters). -
payload
Payload payload
The payload for this index, ornull
. -
prefixMap
PrefixMap<? extends CharSequence> prefixMap
The prefix map for this index, ornull
if the prefix map was not loaded. -
properties
Properties properties
The properties of this index. It is stored here for convenience (for instance, if custom keys are added to the property file), but it may benull
. -
singletonSet
ReferenceSet<Index> singletonSet
An immutable singleton set containing justIndex.keyIndex
. -
sizes
IntBigList sizes
The size of each document, ornull
if sizes are not necessary or not loaded in this index. -
termMap
StringMap<? extends CharSequence> termMap
The term map for this index, ornull
if the term map was not loaded. -
termProcessor
TermProcessor termProcessor
The term processor used to build this index.
-
-
Class it.unimi.di.big.mg4j.index.Index.EmptyIndexIterator extends Object implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
id
int id
-
term
String term
-
termNumber
long termNumber
-
weight
double weight
-
-
Class it.unimi.di.big.mg4j.index.InMemoryHPIndex extends BitStreamHPIndex implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
index
byte[] index
The byte array containing the index. -
positions
byte[] positions
The byte array containing the positions.
-
-
Class it.unimi.di.big.mg4j.index.InMemoryIndex extends BitStreamIndex implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
index
byte[] index
The byte array containing the index.
-
-
Class it.unimi.di.big.mg4j.index.MemoryMappedHPIndex extends BitStreamHPIndex implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
index
ByteBufferInputStream index
The byte buffer containing the index. -
positions
ByteBufferInputStream positions
The byte buffer containing the positions.
-
-
Class it.unimi.di.big.mg4j.index.MemoryMappedIndex extends BitStreamIndex implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
index
ByteBufferInputStream index
The byte buffer containing the index.
-
-
Class it.unimi.di.big.mg4j.index.NullTermProcessor extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialization Methods
-
readResolve
private Object readResolve()
-
-
Class it.unimi.di.big.mg4j.index.QuasiSuccinctIndex extends Index implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
counts
LongBigList counts
The big list of longs representing the bitstream of counts. -
countsOffsets
LongBigList countsOffsets
The list of offsets into counts. -
log2Quantum
int log2Quantum
The logarithm of the skipping quantum. -
pointers
LongBigList pointers
The big list of longs representing the bitstream of pointers. -
pointersOffsets
LongBigList pointersOffsets
The list of offsets into pointers. -
positions
LongBigList positions
The big list of longs representing the bitstream of positions. -
positionsOffsets
LongBigList positionsOffsets
The list of offsets into positions.
-
-
Class it.unimi.di.big.mg4j.index.TooManyTermsException extends Exception implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
numberOfTerms
long numberOfTerms
-
-
-
Package it.unimi.di.big.mg4j.index.cluster
-
Class it.unimi.di.big.mg4j.index.cluster.ChainedLexicalClusteringStrategy extends Object implements Serializable
- serialVersionUID:
- 0L
-
Class it.unimi.di.big.mg4j.index.cluster.ContiguousDocumentalStrategy extends Object implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
cutPoint
long[] cutPoint
The cutpoints. -
k
int k
The (cached) number of segments.
-
-
Class it.unimi.di.big.mg4j.index.cluster.ContiguousLexicalStrategy extends Object implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
cutPoint
int[] cutPoint
The cutpoints. -
cutPointTerm
MutableString[] cutPointTerm
The cutpoint terms. -
k
int k
The (cached) number of segments.
-
-
Class it.unimi.di.big.mg4j.index.cluster.DocumentalCluster extends IndexCluster implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
allIndices
int[] allIndices
An Array containing the numbers from 0 to the number of local indices (excluded). Used to implementIndexReader.documents(long)
more efficiently in flat indices. -
concatenated
boolean concatenated
Whether this documental cluster is concatenated. -
flat
boolean flat
Whether this documental cluster is flat; in this case, all local indices have the same term list. -
strategy
DocumentalClusteringStrategy strategy
The clustering strategy.
-
-
Class it.unimi.di.big.mg4j.index.cluster.DocumentalConcatenatedCluster extends DocumentalCluster implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.index.cluster.DocumentalMergedCluster extends DocumentalCluster implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.index.cluster.FrequencyLexicalStrategy extends Object implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
localNumber
Long2LongOpenHashMap localNumber
The local number of each term.
-
-
Class it.unimi.di.big.mg4j.index.cluster.IdentityDocumentalStrategy extends Object implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
numberOfDocuments
long numberOfDocuments
The number of documents. -
numberOfLocalIndices
int numberOfLocalIndices
The number of local indices.
-
-
Class it.unimi.di.big.mg4j.index.cluster.IndexCluster extends Index implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
localIndex
Index[] localIndex
The local indices of this cluster. -
termFilter
BloomFilter<Void>[] termFilter
An array of Bloom filter to reduce index access, ornull
.
-
-
Class it.unimi.di.big.mg4j.index.cluster.LexicalCluster extends IndexCluster implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
partitioningStrategy
LexicalPartitioningStrategy partitioningStrategy
The strategy, cast to a partition strategy, ornull
. -
strategy
LexicalClusteringStrategy strategy
The strategy to be used.
-
-
-
Package it.unimi.di.big.mg4j.index.payload
-
Class it.unimi.di.big.mg4j.index.payload.AbstractPayload extends Object implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.index.payload.DatePayload extends AbstractPayload implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
secondsFromEpoch
long secondsFromEpoch
-
-
Class it.unimi.di.big.mg4j.index.payload.IntegerPayload extends AbstractPayload implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
unset
boolean unset
Whether this payload has been ever set. -
value
long value
The current value of this payload, ifIntegerPayload.unset
is false.
-
-
-
Package it.unimi.di.big.mg4j.index.snowball
-
Class it.unimi.di.big.mg4j.index.snowball.AbstractSnowballTermProcessor extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
array
char[] array
-
bra
int bra
-
copy
MutableString copy
-
current
MutableString current
-
cursor
int cursor
-
ket
int ket
-
limit
int limit
-
limit_backward
int limit_backward
-
-
Class it.unimi.di.big.mg4j.index.snowball.DanishStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
I_p1
int I_p1
-
I_x
int I_x
-
S_ch
MutableString S_ch
-
-
Class it.unimi.di.big.mg4j.index.snowball.DutchStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
B_e_found
boolean B_e_found
-
I_p1
int I_p1
-
I_p2
int I_p2
-
-
Class it.unimi.di.big.mg4j.index.snowball.EnglishStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
B_Y_found
boolean B_Y_found
-
I_p1
int I_p1
-
I_p2
int I_p2
-
-
Class it.unimi.di.big.mg4j.index.snowball.FinnishStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
B_ending_removed
boolean B_ending_removed
-
I_p1
int I_p1
-
I_p2
int I_p2
-
S_x
MutableString S_x
-
-
Class it.unimi.di.big.mg4j.index.snowball.FrenchStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
I_p1
int I_p1
-
I_p2
int I_p2
-
I_pV
int I_pV
-
-
Class it.unimi.di.big.mg4j.index.snowball.German2Stemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
I_p1
int I_p1
-
I_p2
int I_p2
-
I_x
int I_x
-
-
Class it.unimi.di.big.mg4j.index.snowball.GermanStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
I_p1
int I_p1
-
I_p2
int I_p2
-
I_x
int I_x
-
-
Class it.unimi.di.big.mg4j.index.snowball.HungarianStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
I_p1
int I_p1
-
-
Class it.unimi.di.big.mg4j.index.snowball.ItalianStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
I_p1
int I_p1
-
I_p2
int I_p2
-
I_pV
int I_pV
-
-
Class it.unimi.di.big.mg4j.index.snowball.KraaijPohlmannStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
B_GE_removed
boolean B_GE_removed
-
B_stemmed
boolean B_stemmed
-
B_Y_found
boolean B_Y_found
-
I_p1
int I_p1
-
I_p2
int I_p2
-
I_x
int I_x
-
S_ch
MutableString S_ch
-
-
Class it.unimi.di.big.mg4j.index.snowball.LovinsStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.index.snowball.NorwegianStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
I_p1
int I_p1
-
I_x
int I_x
-
-
Class it.unimi.di.big.mg4j.index.snowball.PorterStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
B_Y_found
boolean B_Y_found
-
I_p1
int I_p1
-
I_p2
int I_p2
-
-
Class it.unimi.di.big.mg4j.index.snowball.PortugueseStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
I_p1
int I_p1
-
I_p2
int I_p2
-
I_pV
int I_pV
-
-
Class it.unimi.di.big.mg4j.index.snowball.SpanishStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
I_p1
int I_p1
-
I_p2
int I_p2
-
I_pV
int I_pV
-
-
Class it.unimi.di.big.mg4j.index.snowball.SwedishStemmer extends AbstractSnowballTermProcessor implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
I_p1
int I_p1
-
I_x
int I_x
-
-
-
Package it.unimi.di.big.mg4j.query
-
Class it.unimi.di.big.mg4j.query.FileSystemItem extends HttpServlet implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.GenericItem extends VelocityViewServlet implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.HelpPage extends VelocityViewServlet implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.InputStreamItem extends HttpServlet implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.MarkingMutableString extends MutableString implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
count
int count
-
currMarkingInterval
int currMarkingInterval
-
currResumeInterval
int currResumeInterval
-
escapeStrategy
MarkingMutableString.EscapeStrategy escapeStrategy
-
interval
SelectedInterval[] interval
The current set of intervals for marking. -
intervalSurround
int intervalSurround
The number of surrounding word around each interval. -
marker
Marker marker
-
marking
boolean marking
-
oneCharOut
boolean oneCharOut
-
resume
boolean resume
-
skipping
boolean skipping
-
-
Class it.unimi.di.big.mg4j.query.QueryServlet extends VelocityViewServlet implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
derelativise
boolean derelativise
If true, URIs are files that should be derelativised. -
documentCollection
DocumentCollection documentCollection
The document collection. -
indexMap
Object2ReferenceMap<String,Index> indexMap
A sorted map from index names to indices: the first entry is the default index. -
queryEngine
QueryEngine queryEngine
The query engine. -
sortedIndex
Index[] sortedIndex
The indices of the fields specified in the index map, in increasing order (for document access). -
template
String template
The actual template used by this servlet (default:QueryServlet.DEFAULT_TEMPLATE
). -
titleList
BigList<? extends CharSequence> titleList
An optional title list if the document collection is not present. -
urlEncodedMimeType
String urlEncodedMimeType
If notnull
, a MIME type suggested to the servlet. -
useUri
boolean useUri
If true, the link associated with each item must be built using the document URI.
-
-
Class it.unimi.di.big.mg4j.query.SelectedInterval extends Object implements Serializable
- serialVersionUID:
- 0L
-
Serialized Fields
-
interval
Interval interval
The underlying interval. -
type
SelectedInterval.IntervalType type
The interval type, ornull
for an untyped interval.
-
-
-
Package it.unimi.di.big.mg4j.query.nodes
-
Class it.unimi.di.big.mg4j.query.nodes.Align extends Object implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.nodes.And extends Composite implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.nodes.Annotation extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
query
Query query
The only underlying node.
-
-
Class it.unimi.di.big.mg4j.query.nodes.Composite extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
query
Query[] query
The component queries. Although public, this field should not be changed after creation.
-
-
Class it.unimi.di.big.mg4j.query.nodes.Consecutive extends Composite implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
gap
int[] gap
The gap array for this consecutive composition, ornull
for no gaps (seeConsecutiveDocumentIterator
). The array can be long asComposite.query
, or have an additional element representing a final gap: in this case, the index against which the query is resolved must provide document sizes.
-
-
Class it.unimi.di.big.mg4j.query.nodes.Containment extends Object implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.nodes.Difference extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
Class it.unimi.di.big.mg4j.query.nodes.False extends Object implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.nodes.Inclusion extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
Class it.unimi.di.big.mg4j.query.nodes.LowPass extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
k
int k
The threshold above which intervals are eliminated. -
query
Query query
The only underlying node.
-
-
Class it.unimi.di.big.mg4j.query.nodes.MultiTerm extends Composite implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.nodes.Not extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
query
Query query
The only underlying node.
-
-
Class it.unimi.di.big.mg4j.query.nodes.Or extends Composite implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.nodes.OrderedAnd extends Composite implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.nodes.Prefix extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
prefix
CharSequence prefix
The common prefix of the set of terms represented by this node.
-
-
Class it.unimi.di.big.mg4j.query.nodes.QueryBuilderVisitorException extends Exception implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.nodes.Range extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
left
CharSequence left
The string representation of the left extreme of the range, ornull
for no left extreme. -
right
CharSequence right
The string representation of the right extreme of the range, ornull
for no right extreme.
-
-
Class it.unimi.di.big.mg4j.query.nodes.Remap extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
indexInverseRemapping
Object2ObjectLinkedOpenHashMap<String,String> indexInverseRemapping
The remapping from external to internal indices. -
indexRemapping
Object2ObjectLinkedOpenHashMap<String,String> indexRemapping
The remapping from internal to external indices. -
query
Query query
The only underlying node.
-
-
Class it.unimi.di.big.mg4j.query.nodes.Select extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
index
CharSequence index
The name of the index selected for the subquery. -
query
Query query
The only underlying node.
-
-
Class it.unimi.di.big.mg4j.query.nodes.Term extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
term
CharSequence term
The term represented by this node, ornull
if the term is defined by its number. -
termNumber
int termNumber
The number of the term represented by this node, or -1 if the term is defined literally.
-
-
Class it.unimi.di.big.mg4j.query.nodes.True extends Object implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.nodes.Weight extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
query
Query query
The only underlying node. -
weight
double weight
The weight selection.
-
-
-
Package it.unimi.di.big.mg4j.query.parser
-
Class it.unimi.di.big.mg4j.query.parser.ParseException extends Exception implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
currentToken
Token currentToken
This is the last token that has been consumed successfully. If this object has been created due to a parse error, the token following this token will (therefore) be the first error token. -
expectedTokenSequences
int[][] expectedTokenSequences
Each entry in this array is an array of integers. Each array of integers represents a sequence of tokens (by their ordinal values) that is expected at this point of the parse. -
tokenImage
String[] tokenImage
This is a reference to the "tokenImage" array of the generated parser within which the parse error occurred. This array is defined in the generated ...Constants interface.
-
-
Class it.unimi.di.big.mg4j.query.parser.QueryParserException extends Exception implements Serializable
- serialVersionUID:
- 1L
-
Class it.unimi.di.big.mg4j.query.parser.Token extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
beginColumn
int beginColumn
The column number of the first character of this Token. -
beginLine
int beginLine
The line number of the first character of this Token. -
endColumn
int endColumn
The column number of the last character of this Token. -
endLine
int endLine
The line number of the last character of this Token. -
image
String image
The string image of the token. -
kind
int kind
An integer that describes the kind of this token. This numbering system is determined by JavaCCParser, and a table of these numbers is stored in the file ...Constants.java. -
next
Token next
A reference to the next regular (non-special) token from the input stream. If this is the last token from the input stream, or if the token manager has not read tokens beyond this one, this field is set to null. This is true only if this token is also a regular token. Otherwise, see below for a description of the contents of this field. -
specialToken
Token specialToken
This field is used to access special tokens that occur prior to this token, but after the immediately preceding regular (non-special) token. If there are no such special tokens, this field is set to null. When there are more than one such special token, this field refers to the last of these special tokens, which in turn refers to the next previous special token through its specialToken field, and so on until the first special token (whose specialToken field is null). The next fields of special tokens refer to other special tokens that immediately follow it (without an intervening regular token). If there is no such token, this field is null.
-
-
Class it.unimi.di.big.mg4j.query.parser.TokenMgrError extends Error implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
errorCode
int errorCode
Indicates the reason why the exception is thrown. It will have one of the above 4 values.
-
-
-
Package it.unimi.di.big.mg4j.search
-
Class it.unimi.di.big.mg4j.search.Index2IntervalIteratorMap extends AbstractReference2ReferenceMap<Index,IntervalIterator> implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
key
Index[] key
The keys (valid up toIndex2IntervalIteratorMap.size
, excluded). -
size
int size
The number of valid entries inIndex2IntervalIteratorMap.key
andIndex2IntervalIteratorMap.value
. -
value
IntervalIterator[] value
The values (parallel toIndex2IntervalIteratorMap.key
).
-
-
-
Package it.unimi.di.big.mg4j.tool
-
Class it.unimi.di.big.mg4j.tool.URLMPHVirtualDocumentResolver extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
url2DocumentPointer
StringMap<? extends CharSequence> url2DocumentPointer
The term map used by this resolver to associated URI strings to numbers.
-
-
-
Package it.unimi.di.big.mg4j.util.parser.callback
-
Class it.unimi.di.big.mg4j.util.parser.callback.AnchorExtractor.Anchor extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
anchorText
MutableString anchorText
The text surrounding this anchor. -
href
MutableString href
The content of the href attribute for this anchor.
-
-