Search Help

Search Help

The search mechanism supports a large variety of patterns, including simple strings, strings with classes of characters, sets of strings, wild cards, and regular expressions.

Summary
Rule Explanation To search for... Enter...
Boolean AND To search for multiple terms, separate by semicolons larry AND moe AND curly larry;moe;curly
Boolean OR To search for any of several terms, separate by commas larry OR moe OR curly larry,moe,curly

Strings
Strings are any sequence of characters, including the special symbols `^' for beginning of line and `$' for end of line. The following special characters ( `$', `^', `*', `[', `^', `|', `(', `)', `!', and `\' ) as well as the following meta characters special to the search: `;', `,', `#', `<', `>', `-', and `.', should be preceded by `\' if they are to be matched as regular characters. For example, \^abc\ corresponds to the string ^abc\, whereas ^abc corresponds to the string abc at the beginning of a line.

Classes of characters
A list of characters inside [] (in order) corresponds to any character from the list. For example, [a-ho-z] is any character between a and h or between o and z. The symbol `^' inside [] complements the list. For example, [^i-n] denote any character in the character set except character `i' to `n'. The symbol `^' thus has two meanings, but this is consistent with egrep. The symbol `.' stands for any symbol (except for the newline symbol).

Boolean operations
The search supports an `AND' operation denoted by the symbol `;' an `OR' operation denoted by the symbol `,', or any combination. For example, `pizza;cheeseburger' will output all lines containing both patterns.

Wild cards
The symbol `#' is used to denote a sequence of any number (including 0) of arbitrary characters . The symbol # is equivalent to .* in egrep. In fact, .* will work too, because it is a valid regular expression (see below), but unless this is part of an actual regular expression, # will work faster.

Combination of exact and approximate matching Any pattern inside angle brackets <> must match the text exactly even if the match is with errors. For example, <mathemat>ics matches mathematical with one error (replacing the last s with an a), but mathe<matics> does not match mathematical no matter how many errors are allowed.

Regular expressions
Since the index is word based, a regular expression must match words that appear in the index for the search to find it. The search first strips the regular expression from all non-alphabetic characters, and searches the index for all remaining words. It then applies the regular expression matching algorithm to the files found in the index. For example, `abc.*xyz' will search the index for all files that contain both `abc' and `xyz', and then search directly for `abc.*xyz' in those files. The union operation `|', Kleene closure `*', and parentheses () are all supported. Currently `+' is not supported. Regular expressions are currently limited to approximately 30 characters (generally excluding meta characters). The maximal number of errors for regular expressions that use `*' or `|' is 4.


Glimpse Home Page