Information Retrieval

What is information retrieval?

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

Information retrieval is fast becoming the dominant form of information access, overtaking traditional database style searching. The term “unstructured data’  refers to data which does not have clear, semantically overt, easy-for-a-computer structure. This is definitely true of all text data if you count latent linguistic structure of human languages.

The field of information retrieval also covers supporting users in browsing or filtering document collections or further processing a set of retrieved documents. Information retrieval systems can also be distinguished by the scale at which they operate, and it is useful to distinguish three prominent scales. In web search, the system has to provide search over billions of documents meticulously stored on millions of computers. At the other extreme is personal information retrieval.

Email programs usually not only provide search but also text classification. Distinctive issues here include handling the broad range of document types on a typical personal computer. In between is the space of enterprise, institutional, and domain-specific search. This book contains techniques of value over this whole spectrum.

Inforetrievalauto1

An Introduction to Information Retrieval: Manning