|
For each of the three word-groups, special limiters can be set.
The limiters currently work only at whole word-groups, not at single words.
This might change in future versions of this interface when experiments
show that working with groups does not show sufficient detail.
The action of the limiters are all shown as histograms
(the data in displayed in the figure is for example) with a possibility to
adjust the lower- and upper-bound for your request, linear or logarithmic
scaling, and cumulative, spike, or Gaussian presentation.
Each word-group has a column (respectively keywords, related words,
forbidden
words). Selecting a box will set a limiter. A stopping-hand signals
that the limiter is temporarily disabled. The value in the box shows
the effectiveness of the limiter on the number of answers on the question.
At the moment, the following limiters are intended:
- Hits per page per site.
The average number of hits per page which is hit in relation to the number
of sites with hits. This visualizes the density of the hits, with
respect to sites.
- Hits per site.
The total number of hits for a site, in relation to the amount of sites
which have that many hits. With this limiter, you can find sites
with a large amount of information about the subject, independent from
the size of the site.
- Pages hit per site.
The total number of pages for a site which contain any of the words, in
relation to the amount of sites which have that many pages hit. This
indicated sites which are likely to have a specialized section about the subject.
- Pages hit percentage per site.
The number of pages hit in a site as percentage of the pages of the
whole site, in relation to the number of sites which have that word. This
limiter can be used to find sites specialized in the subject you look
for.
- Limiter on location.
This is not a histogram, but a checklist with possibilities to restrict
the appearance of words to (a combination) of
- the title of the page,
- the meta-keyword line in HTML-pages,
- the meta-description line in HTML-pages, and
- the content of the page.
This limiter is not yet implemented.
More than one limiter can be set at the same time, for any of the three
groups of words.
These limiters require more detailed information from the search-engine than is currently available.
The first implementation of the interface will not ignore overlapping hits:
when two words from the same group meet on the same page, this page will be
counted twice. Hence, the histogram will not show the
real distribution of suitable sites. This is certainly not optimal, when
we consider that many
words will have overlaps because they are related.
The reason not to implement the best solution is the exponential
behavior of this data: the search engine has to recalculate all the possible
combinations, and do this over for each word each time a word is added,
moved or deleted from the list of selected keywords. The intention is
to have to spider produce a lot of suggestions on words to
pin-point the question optimal. Exponential behavior will be destructive.
By just ignoring the overlaps, the situation changes to simple linear lookups.
Experiments shall show if this is simplification will give acceptable
results.
Next Limiters on Sites.
Up The Selection Process.
|