Home >> Computers >> Software >> Information Retrieval


  Classification
Data Clustering
Fulltext
GILS
  Internet Search Engines
Ranking
References
Text Clustering
  Visual Information
Web Clustering


Facts retrieval (IR) is the art & science of shopping for information in documents, shopping for documents themselves, searching for metadata which describe documents, or looking inside databases, whether relational stand alone databases or even even hypertext networked databases like a Internet or intranets, for text, healthy, images or information. There is a most common confusion, all the same, between information retrieval, document retrieval, information retrieval, & text retrieval, & apiece one keep around their have bodies of literature, theory, practice and technologies.

A term "information retrieval" was coined by Calvin Mooers in 1948-50.

IR occurs as wide interdisciplinary field, that draws in numbers of more disciplines. Indeed, because these are then wide, these are commlof these ill understood, existence approached usually from either only one perspective or even an additional. It stands at a junction of numbers of constituted fields, & draws upon cognitive psychology, data architecture, information design, human references behaviour, linguistics, semiotics, information science, computer science and librarianship.

Machine-controlled tools retrieval (IR) systems were originally utilized to handle references explosion around scientific literature in the endure couple of decades. Numbers of universities & public libraries have IR systems to provide access to books, journals, & more documents. IR systems come typically related object & query. Inquiry come formal statements of facts needs that come put to an IR formulas per user. An object is an the cappella which keeps or even places facts inside a database. User question come matched to documents stored around the database. The document is, so, the information object. Typically a documents themselves come not saved or even stored directly in the IR formulas, however are instead delineate in the technique by document surrogates.

Around 1992 a Department of Defense, along by having a National Institute of Standards and Technology (NIST), cosponsored the Text Retrieval Conference (TREC) as a share of the Tout text program. the aim of this was to look into a information retrieval community by supplying the infrastructure that was required for such a brobdingnagian evaluation of text retrieval methodologies.

Web search engines such as Google and Lycos are amongst the virtually all seeable applications of info retrieval locate.

Performance measures

There are various ways to measure how else swell a retrieved info matches a arranged trading tools:

Precision

A proportion of relevant documents of everthing documents retrieved:

Inside binary classification, precision is correspondent to positive predictive value. Preciseness can likewise exist as evaluated at the given cut-off rank, denoted P@n, instead of everthing retrieved documents.

Recall

A proportion of retrieved documents that come relevant, away from tons relevant documents available:

Around binary classification, recall is known as sensitivity.

F-measure

A harmonic mean of precision & recall:

Mean average precision

Above a placed of interrogation, locate the mean value of the typical preciseness, in which Norm Preciseness is the norm of the preciseness when from each one relevant document is retrieved.

In which r is the rank, North a total retrieved, rel() the binary work on the relevancy of the given rank, & P() preciseness at the given cut-off rank:

This method emphasizes giving supplementary relevant documents earliest.

Model types
For a successful IR, these are necessary to represent the documents in some manner. There are the total of system for this purpose about dividable into tercet independent groups: Set-theoretic / Boolean models
Standard Boolean model Extended Boolean model fuzzy retrieval

Algebraic / vector space models
Vector space model Generalized vector space model Topic-based vector space model Enhanced topic-depending vector space model Latent semantic indexing aka latent semantic analysis

Probabilistic models
Binary independence retrieval Uncertain inference Language models Divergence from randomness models

Open source information retrieval systems
[http://www.lemurproject.org/ Lemur] Language Modelling IR Toolkit [http://lucene.apache.org/java/docs/ Lucene] Apache Jakarta project [ftp://ftp.cs.cornell.edu/pub/smart/ SMART] Early IR engine from either Cornell University [http://ir.dcs.gla.ac.uk/terrier Terrier] Facts Retrieval Platform [http://www.xapian.org/ Xapian] Open source IR platform according to Muscat [http://www.seg.rmit.edu.au/zettair/ Zettair] [http://www.htdig.org/ ht://dig] Open source web creep software [http://www.nzdl.org/html/mg.html MG full-text retrieval system] Currently maintained per Greenstone Digital Library Software Project [http://www.cs.uni.edu/~okane/source/ISR/isr.html Information Storage and Retrieval Using Mumps](On the net GPL Text)

Major information retrieval research groups
[http://ir.dcs.gla.ac.uk Glasgow Information Retrieval Group] [http://ciir.cs.umass.edu/ Center for Intelligent Information Retrieval] [http://www.ir.iit.edu/ IIT Information Retrieval Lab] [http://www.dcs.vein.hu/CIR/ CIR Centre for Information Retrieval]

Major figures in information retrieval
Calvin Mooers Eugene Garfield Gerard Salton W. Bruce Croft Karen Spärck Jones C. J. van Rijsbergen S. Dominich Awards therein field: Tony Kent Strix award

ACM SIGIR Gerard Salton Award
; 1983 - Gerard Salton, Cornell University : "About the future of automatic information retrieval" ; 1988 - Karen Sparck Jones, University of Cambridge : "A look back and a look forward" ; 1991 - Cyril Cleverdon, Cranfield Institute of Technology : "The significance of the Cranfield tests on index languages" ; 1994 - William S. Cooper, University of California, Berkeley : "The formalism of probability theory in IR: a foundation or an encumbrance?" ; 1997 - Tefko Saracevic, Rutgers University : "Users lost: reflections on the past, future, and limits of information science" ; 2000 - Stephen E. Robertson, City University London : "On theoretical argument in information retrieval" ; 2003 - W. Bruce Croft, University of Massachusetts, Amherst : "Information retrieval and computer science: an evolving relationship"

Knowledge Navigation Suite
A suite of information indexing and classification tools that supports information sharing and textual data mining based on natural language processing, statistical pattern analysis, and neural networks techniques. Supports large-scale terabyte data analysis and visualization.

Center for Networked Information Discovery and Retrieval
Information on the team, history and projects of this research lab.

Willow
A now discontinued Z39.50 bibliographic information retrieval tool from University of Washington.

The Glasgow Information Retrieval Group
Has a research program aimed at giving better access to multi-media information.

Information Retrieval
An online book by C. J. van Rijsbergen, University of Glasgow.

Text REtrieval Conference (TREC)
An annual information retrieval conference and competition, the purpose of which is to support and further research within the information retrieval community.

The Center for Intelligent Information Retrieval
University of Massachusetts research lab focused on efficient access to large, heterogeneous, distributed, text and multimedia databases.

MultiCentrix
Software for information mapping, knowledge management, and computer aided thinking.

The unCommon Hub
A suite of components, that when connected and utilized in concert with one another provide you with a powerful information management and data integration tool.

Information Retrieval Research
An up-to-date overview of research in the field of information retrieval.


Computers: Software: File Management: Search
Computers: Software: Internet: Servers: Search
Reference: Knowledge Management: Knowledge Retrieval
Reference: Libraries: Library and Information Science: Software





© 2005 GeneralAnswers.org