Boolean Retrieval

Grepping through text can be a very effective process, especially given the speed of modern computers. It often allows useful possibilities for wildcard pattern matching through the use of regular expressions. With modern computers, for simple querying of modest collections, you really need nothing more. But for many purposes, you do need more to process large document collections quickly.

In many cases you want the best answer to an information need among many documents that contain certain words. The way to avoid linearly scanning the texts for each query is to index the documents in advance. Terms are the indexed units, and for the moment you can think of them as words.

The model views each document as just a set of words. By documents we mean whatever units we have decided to build a retrieval system over.

The simplest form of document retrieval is for a computer to do this sort of linear scan through documents. This process is commonly GREP referred to as grepping through text, after the Unix command grep, which performs this process.

The Boolean retrieval model is a model for information retrieval in which we MODEL can pose any query which is in the form of a Boolean expression of terms, that is, in which terms are combined with the operators AND, OR, and NOT. The model views each document as just a set of words.


Inforetrievalauto2

An Introduction to Information Retrieval: Manning