The following resources are available:
Documentation
General-purpose releases
These releases should compile under any version of Linux with gcc 3.0 or greater.
If they do not, please send me an
email.
-
Wumpus 2007-11-23 (.tar.gz, 1089 KB)
This release fixes some minor problems as well as an annoying off-by-one
bug that sometimes showed up when fetching the text for a given index
extent, and the text was not properly aligned with the index data.
-
Wumpus 2007-09-07 (.tar.gz, 1102 KB)
This release introduces a new on-disk index format. The new
version can read the index files created by an older version, but older
versions cannot process index files produced by the new version!
-
Wumpus 2007-06-18 (.tar.gz, 1086 KB)
TREC Terabyte release
-
Wumpus 2006-05-03-TREC (.tar.gz, 884 KB)
Use this release if you want to do a performance baseline run for TREC Terabyte 2006.
After downloading the package, untar it and chdir to the wumpus/ directory.
Then type make. If the build process fails, please send
me a bug report so that I can try to fix
the problem. Under Linux, everything should work smoothly. After building the distribution,
there are two run modes. From within the wumpus/ directory, execute
-
bin/trec INDEX input_file output_file log_file
in order to build an index for the document collection. The index will be created
in wumpus/database/. The command-line parameter input_file
must refer to a file that contains a list of all files that are to be indexed (these
files may either be plain text, .gz, or .bz2, but must be in TREC format). Nothing
is written to the file given by output_file. Log messages, including
performance information, are written to log_file.
-
bin/trec QUERY input_file output_file log_file
in order to run search queries against the index built by following the instructions
above. The command-line parameter input_file must refer to a file
that contains all ad-hoc search queries that are to be executed. The queries in the
input file must be of the format "topic_id term_1 term_2 .. term_N". To see the exact
format, you can download a set of
example queries.
TREC-formatted search results are written to output_file. Log messages,
including performance information, are written to log_file.
System requirements: In order to perform a Wumpus performance baseline run for the
GOV2 document collection, you need at least 512 MB of RAM and 40 GB of free hard disk space
on the partition where you install Wumpus.
Wumpus is free software and licensed under the GNU General Public License
(GPL).