User:KYPark/002

A DIRECT APPROACH TO INFORMATION RETRIEVAL

Table of Contents

   WHAT
   WHY
   HOW
1. INTRODUCTION
2. THE LINE OF ATTACK
3. SYSTEMS VS. USERS
   3.1 Discrimination
   3.2 Prediction
4. DOCUMENTS VS. SURROGATES
5. THE THEORY OF INTERPRETATION
   5.1 Denotation and Connotation
   5.2 The Theory of Ogden and Richards
   5.3 Implications for Information Retrieval
6. PROPOSAL FOR FILE ORGANIZATION
   6.1 Incentives
   6.2 Extracts as Indexing Sources
   6.3 Extracts as Review Sources
7. CONCLUSION
8. REFERENCES

Contents

2. THE LINE OF ATTACK[edit]

The overall view of main retrieval events may be represented schematically as shown in Figure 1. It may be said here that:

S-d (substitution): The system S substitutes d for a document D for the purpose of notification and prediction.
d-U (notification): The user U is notified of a document D through d, and discern what the document D is about.
U-E (interaction): The user U interacts with the system S, giving evidence E either on his information need, or on his satisfaction of the need.
E-S (inference): The system S makes inferences from the evidence E, either making a search formulation, or evaluating its performance.
S-D (prediction): The system S predicts a relevant document D, based on d and the search formulation.
D-U (discrimination): The user U discriminates the document D in the light of his information need.

Figure 1. Schematic View of Information Retrieval Events.

Information retrieval is a complex type of communication between the system and the user. The schematic diagram in Figure 1 roughly shows the situation. Admittedly, the diagram is too simple and crude for explaining information retrieval meaningfully. It will be expanded in Chapter 5. Meanwhile, it may suffice to show how to approach retrieval problems.

What we want to know ultimately is the relationship between the system and the user, which is represented in Figure 1 by the solid arrows and characterized by prediction and discrimination of documents. Also, we can consider many other relationships in the diagram; for example, those represented by the dotted arrows and the broken arrows. Here we can reasonably assert that all knowledge of these relationships should concentrate on explicating the relationship of utmost importance between the system and the user.

On the other hand, information retrieval may be possible with little or no attention to knowledge of the relationship between the system and the user. That is to say, we can contain the system and the user in a black box*, perform information retrieval, and improve the performance successively by feedback control. Combination of the solid arrows and the dotted arrows makes a closed cycle for feedback control. The black box has two input terminals, Ein and Din, which are input to the system and the user, respectively. It also has two output terminals: one for the user to give Eout in search of, and then in response to, Din, and the other for the system to retrieve Dout in response to Ein. This principle is illustrated in Figure 2, where Po represents the given initial condition or a set of performance factors of the black box.

Figure 2. Feedback Control of Information Retrieval.

Whether or not it is possible and practicable, this principle almost certainly would not tell much about the relationship between the system and the user, meaningfully. In other words, it may not necessarily be suitable for explicating the relationship. Even if suitable, it can explain the relationship only indirectly, i.e., through inferences from a great deal of valid and consistent evidence.

The approach that has been overwhelmingly used in the field of information retrieval is very similar to this principle. The main difference is to change the initial condition Po in many ways in order to know which initial condition will give the optimum performance of the system. This approach is not quite intended to know the relationship between the system and the user. The other, direct approach will be attempted in this study.

* "I shall understand by a black box a piece of apparatus, such as four-terminal networks with two input and two output terminals, which performs a definite operation on the present and past of the input potential, but for which we do not necessarily have any information of the structure by which this operation is performed. On the other hand, a white box will be similar network in which we have built in the relation between input and output potentials in accordance with a definite structural plan for securing a previously determined input-output relation." -- Norbert Wiener, Cybernetics³

AFTERTHOUGHTS[edit]

See also

Black box