Package it.unimi.di.big.mg4j.tool
Class FilterOutWikipediaDuplicates
- java.lang.Object
-
- it.unimi.di.big.mg4j.tool.FilterOutWikipediaDuplicates
-
public class FilterOutWikipediaDuplicates extends Object
Reads a Wikipedia XML dump and outputs the same dump after eliminating duplicate pages. A duplicate page is a page whose title appeared earlier in the XML stream.
-
-
Method Detail
-
main
public static void main(String[] arg) throws IOException, com.martiansoftware.jsap.JSAPException, XMLStreamException
- Throws:
IOException
com.martiansoftware.jsap.JSAPException
XMLStreamException
-
-