|
Facts retrieval (IR) is the art & science of shopping for information in documents, shopping for documents themselves, searching for metadata which describe documents, or looking inside databases, whether relational stand alone databases or even even hypertext networked databases like a Internet or intranets, for text, healthy, images or information. There is a most common confusion, all the same, between information retrieval, document retrieval, information retrieval, & text retrieval, & apiece one keep around their have bodies of literature, theory, practice and technologies.
A term "information retrieval" was coined by Calvin Mooers in 1948-50.
IR occurs as wide interdisciplinary field, that draws in numbers of more disciplines. Indeed, because these are then wide, these are commlof these ill understood, existence approached usually from either only one perspective or even an additional. It stands at a junction of numbers of constituted fields, & draws upon cognitive psychology, data architecture, information design, human references behaviour, linguistics, semiotics, information science, computer science and librarianship.
Machine-controlled tools retrieval (IR) systems were originally utilized to handle references explosion around scientific literature in the endure couple of decades. Numbers of universities & public libraries have IR systems to provide access to books, journals, & more documents. IR systems come typically related object & query. Inquiry come formal statements of facts needs that come put to an IR formulas per user. An object is an the cappella which keeps or even places facts inside a database. User question come matched to documents stored around the database. The document is, so, the information object. Typically a documents themselves come not saved or even stored directly in the IR formulas, however are instead delineate in the technique by document surrogates.
Around 1992 a Department of Defense, along by having a National Institute of Standards and Technology (NIST), cosponsored the Text Retrieval Conference (TREC) as a share of the Tout text program. the aim of this was to look into a information retrieval community by supplying the infrastructure that was required for such a brobdingnagian evaluation of text retrieval methodologies.
Web search engines such as Google and Lycos are amongst the virtually all seeable applications of info retrieval locate.
Performance measures
There are various ways to measure how else swell a retrieved info matches a arranged trading tools:
Precision
A proportion of relevant documents of everthing documents retrieved:
Inside binary classification, precision is correspondent to positive predictive value.
Preciseness can likewise exist as evaluated at the given cut-off rank, denoted P@n, instead of everthing retrieved documents.
Recall
A proportion of retrieved documents that come relevant, away from tons relevant documents available:
Around binary classification, recall is known as sensitivity.
F-measure
A harmonic mean of precision & recall:
Mean average precision
Above a placed of interrogation, locate the mean value of the typical preciseness, in which Norm Preciseness is the norm of the preciseness when from each one relevant document is retrieved.
In which r is the rank, North a total retrieved, rel() the binary work on the relevancy of the given rank, & P() preciseness at the given cut-off rank:
This method emphasizes giving supplementary relevant documents earliest.
Model types
For a successful IR, these are necessary to represent the documents in some manner. There are the total of system for this purpose about dividable into tercet independent groups:
Set-theoretic / Boolean models
Standard Boolean model
Extended Boolean model
fuzzy retrieval
Algebraic / vector space models
Vector space model
Generalized vector space model
Topic-based vector space model
Enhanced topic-depending vector space model
Latent semantic indexing aka latent semantic analysis
Probabilistic models
Binary independence retrieval
Uncertain inference
Language models
Divergence from randomness models
Open source information retrieval systems
[http://www.lemurproject.org/ Lemur] Language Modelling IR Toolkit
[http://lucene.apache.org/java/docs/ Lucene] Apache Jakarta project
[ftp://ftp.cs.cornell.edu/pub/smart/ SMART] Early IR engine from either Cornell University
[http://ir.dcs.gla.ac.uk/terrier Terrier] Facts Retrieval Platform
[http://www.xapian.org/ Xapian] Open source IR platform according to Muscat
[http://www.seg.rmit.edu.au/zettair/ Zettair]
[http://www.htdig.org/ ht://dig] Open source web creep software
[http://www.nzdl.org/html/mg.html MG full-text retrieval system] Currently maintained per Greenstone Digital Library Software Project
[http://www.cs.uni.edu/~okane/source/ISR/isr.html Information Storage and Retrieval Using Mumps](On the net GPL Text)
Major information retrieval research groups
[http://ir.dcs.gla.ac.uk Glasgow Information Retrieval Group]
[http://ciir.cs.umass.edu/ Center for Intelligent Information Retrieval]
[http://www.ir.iit.edu/ IIT Information Retrieval Lab]
[http://www.dcs.vein.hu/CIR/ CIR Centre for Information Retrieval]
Major figures in information retrieval
Calvin Mooers
Eugene Garfield
Gerard Salton
W. Bruce Croft
Karen Spärck Jones
C. J. van Rijsbergen
S. Dominich
Awards therein field: Tony Kent Strix award
ACM SIGIR Gerard Salton Award
; 1983 - Gerard Salton, Cornell University : "About the future of automatic information retrieval"
; 1988 - Karen Sparck Jones, University of Cambridge : "A look back and a look forward"
; 1991 - Cyril Cleverdon, Cranfield Institute of Technology : "The significance of the Cranfield tests on index languages"
; 1994 - William S. Cooper, University of California, Berkeley : "The formalism of probability theory in IR: a foundation or an encumbrance?"
; 1997 - Tefko Saracevic, Rutgers University : "Users lost: reflections on the past, future, and limits of information science"
; 2000 - Stephen E. Robertson, City University London : "On theoretical argument in information retrieval"
; 2003 - W. Bruce Croft, University of Massachusetts, Amherst : "Information retrieval and computer science: an evolving relationship"
|