|
The need for a new setup is accute. A proposed schedule to transform the
way we search for information on the web is presented below:
- Start some fetchers. Especially for small languages, this
is rather inexpensive and easily made.
For the English language, more work has to be done: one server is not enough
to scan all web sites
Some free software developers have already produced packages
which can be deployed without too many changes.
- The storage-servers have to exchange information, for instance when they
find pages in a language they do not serve. This protocol has to be
developed, but can be based on normal HTTP for the communication.
- The interface where administrators can register and specify how their
site should be downloaded should make sites to open-up their doors for
the central fetcher. This shall be combined with promotional activities.
- Modules for the public and private extractors and collectors are
already wide-spread. In the Perl libraries
(CPAN), for
instance, quite a lot of useful modules can be found.
The interface to the storage can be very simple on file-by-file bases.
- The user-interface based on existing modules can start with a simple
plaintext search and textual output although alternatives are being
developed. There will be simple interfaces with rough results, which can
be used by anyone, but also complex search methods designed for trained
librarians or other specialists. An experimental version of such an
interface is described in my other paper.
- When a first simple implementation is ready, new spiders
should be stimulated to use the configuration. It might be easy to attract
smaller spiders: just because they save a lot of work writing code and
costs for disk-space and network-access.
- When more and more sites open-up for the central fetching, the
existing spiders will be more willing to change. Large engines might
never be willing to commit to the proposed structure.
Next Conclusions.
Up Main page.
|