MG4JMG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. MG4J is a highly customisable, high-performance, full-fledged search engine providing state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms.

The main points of MG4J are:

The starting point for understanding MG4J is a look at the tutorial, which explains how to index a sample collection and query the newly constructed index from the command line or using a browser. Then, the Javadoc class documentation can provide more insights.

MG4J is free software distributed under the GNU Lesser General Public License. If you find MG4J useful, we kindly ask you to quote the following reference:

        title = "{M}{G}4{J} at {T}{R}{E}{C} 2005",
        author="Paolo Boldi and Sebastiano Vigna",
        year = 2005,
        booktitle = "The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings",
        editor = "Ellen M. Voorhees and Lori P. Buckland",
        publisher = "NIST",
        series = "Special Publications",
        number = "SP 500-266",
	note = "\texttt{\small}",


InstallYou can grab MG4J from Maven Central. Otherwise, you just have to install the .jar file coming with the distribution and the dependencies, which are gathered for your convenience in a tarball.


Here you can find (in no particular order) research papers that have been written using MG4J. The list is not exhaustive, and we will be happy to include works that are missing.