SWISH Changes
Version 1.1.1 - March 14, 1995
- A bad problem in merging index files appears to have been fixed. You may experience problems when merging more than two large (> 1 MB) index files at once.
Version 1.1 - March 11, 1995
The most prominent changes are:
- SWISH now has a simple parser - you can now use AND, OR, and NOT operations as well as parentheses to nest keywords. You can use wildcards to search for the beginnings of words. As an example, you could do a search such as "((this and that) or (not apples and ora*))".
- The index format has changed slightly to make searching at least three times faster on average, thanks to the increased use of using file offsets to data. The time it takes to search large datasets is much, much less.
- You can specify multiple directories and files to index, either on the command line or in the configuration file.
- You can specify multiple files to search.
- You can specify custom lists of stopwords (words that are too common to use in a search) within the configuration file - these are included in the generated index. Stopwords can also be automatically generated by SWISH as it indexes, according to user-settable parameters.
- You can merge multiple (two or more) index files. Doing so removes all redundant data, and the operation takes up much less memory than the size of the index files to be merged.
- Index files can now include file and word sizes, a title, the creation date, and other administrative information.
- HTML numbered entities can be converted to named entities (for instance,
©
can be converted to ©
). They can also be converted to 7-bit ASCII equivalents when possible, so you could search for "resumé" as "resume".
- Context information is saved in index files, so you can now search for words that exist in titles,
<HEAD>
elements, <BODY>
elements, comments, header tags, emphasized (<B>
, <I>
, <EM>
, <STRONG>
) tags, or any mixture of these criteria.
- You can define what characters make up a word and other aspects of what consititutes a word, so you can index textual data more efficiently.
Version 1.0 - November 4, 1994
- Initial release - inspired by many nights of trying to configure WAIS.
:)
Thanks to Alan Schiffman and Eric Rescorla for suggesting and helping with SWISH's compression scheme.