Indexing Systems

16 of 20


	Interface Implementation	Layering the Indexing System

To be able to test my interface, I was looking for data. To my (not really big) surprise, there is no testbed for new technology in this field. If you want to do anything with web-pages, for any research or application, you have to implement all parts of a spider yourself. This is very costly and time-consuming. This needs improvement.

There are three groups of indexing systems on the current Internet:

the Manually maintained indexes: where people add site-descriptions, and sites are categorized. Their usefulness is deminishing, as shown before. An example is Yahoo!.
Distributed manual indexes: are implemented like a library with many places where people add information to web-pages to improve the retreivability of the information. Many of these systems exist in the field of research, like DESIRE, ROADS, etc.
These systems usually require a piece of extra software to be installed on any of the participating web-sites.
Fully automatic indexes: add the most widely spread. They index all pages (sometimes limited to a country, sometimes the whole Internet), without any manual interference. For example AltaVista.

Most people use the general, fully automatic engines. However, there are about 480 of them which all try to gather the information on their own. This is one of the reasons why most internet-sites closed their doors to search-engines: they don't want all engines slowing down the use by normal users.

Mark A.C.J. Overmeer, AT Computing bv, 1999.