Phased Implementation

The need for a new setup is accute. A proposed schedule to transform the way we search for information on the web is presented below:

Start some fetchers. Especially for small languages, this is rather inexpensive and easily made. For the English language, more work has to be done: one server is not enough to scan all web sites Some free software developers have already produced packages which can be deployed without too many changes.
The storage-servers have to exchange information, for instance when they find pages in a language they do not serve. This protocol has to be developed, but can be based on normal HTTP for the communication.
The interface where administrators can register and specify how their site should be downloaded should make sites to open-up their doors for the central fetcher. This shall be combined with promotional activities.
Modules for the public and private extractors and collectors are already wide-spread. In the Perl libraries (CPAN), for instance, quite a lot of useful modules can be found. The interface to the storage can be very simple on file-by-file bases.
The user-interface based on existing modules can start with a simple plaintext search and textual output although alternatives are being developed. There will be simple interfaces with rough results, which can be used by anyone, but also complex search methods designed for trained librarians or other specialists. An experimental version of such an interface is described in my other paper.
When a first simple implementation is ready, new spiders should be stimulated to use the configuration. It might be easy to attract smaller spiders: just because they save a lot of work writing code and costs for disk-space and network-access.
When more and more sites open-up for the central fetching, the existing spiders will be more willing to change. Large engines might never be willing to commit to the proposed structure.

Next Conclusions.
Up Main page.