|
Home / Library/Abstracts
How Results Are Found and Displayed
Documents are found based on the criteria described below, and they are displayed in reverse order by NCJ number up to the number of documents specified by the user. This order ensures that the most current documents relevant to the search criteria appear at the top of the list.
Results of Boolean Searches A Boolean search finds exactly the words a user types. If the word or combination of words is found anywhere in a document, that document is included in the search results.

Results of Concept and Pattern Searches As a first step, the search simply looks for the existence or absence of query words or related terms in the document. The calculation process ranks various factors, and each factor adds a certain relative "weight" to the document. Added together, these weights determine a document's relevance. If a document has any relevance to the search criteria, no matter how small the relevance, that document is included in the results set. Weights are determined by the following factors:
-
Completeness: The larger the number of query words (either exactly or by reference), the greater the weight. A relevant document should contain at least one term or related term for each word in the query. If the document contains only a fraction of the original words, then the maximum rank of the document is equal to this fraction. For example, if the document contains only 3 out of the 4 original terms in the query, then its maximum rank is 75 percent. Related terms contribute less weight than the original (exact) words. If a query consists of three query terms, a document containing one instance of each of the three words will be ranked higher than a document containing 100 instances of one of the query terms.
-
Contextual Evidence: The larger the number of related terms, the greater the weight. Words are supported by their related terms. If a document contains a word and its related terms, the word is given a greater weight because it is surrounded by supporting evidence. For example, the word "charge" near the words "credit," "debt," and "card" is more likely to mean "charge card" than to mean "ward," "battery energy," or "to assign a task."
-
Semantic Distance: The more closely related the terms, the greater the weight. For example, synonyms are more closely related than antonyms. This association is used to compute the amount of contextual evidence that supports a word. The closer the terms are in relationship to the query words, the more weight they are given.
To further refine this semantic relevance, the search looks both at the physical location of query words and related terms within the document and the total number of terms. By default, a document's rank is calculated using a formula that equally combines the factors described below.
-
Proximity: The closer together the query words and related terms are within the document, the greater the weight. A document is judged more relevant if it contains related terms that occur close together, preferably in the same sentence or paragraph. The system computes a factor for physical proximity, which increases for adjacent terms and lessens as terms become increasingly distant (physically) from each other. Thus, documents with many hits close together are ranked higher than documents in which those same hits are present but scattered far apart.
-
Hit Density: The greater the ratio of query words and related terms to the total number of words in the document, the greater the weight. A document is judged more relevant if a large portion of the total number of words in it are query words or related terms. Thus, short documents with many hits are ranked higher than longer documents that have the same number of hits.
 | |  Previous |  Next | |
| |
|
|
|
|
Last updated on: 4/3/2007 |
|