Internet

2. Tools of search

Conditionally tools of search are subdivided into search means of a help

type (directories) and retrieval systems in the pure state (search

engines).

2.1 Thematic catalogues

The search tools of the first type name subject, or thematic catalogues

more often. The company owning such catalogue, continuously conducts huge

work, investigating, describing, cataloguing and displaying on half-glasses

contents of WWW-servers and other network resources scattered worldwide.

Result her(it) titanium of efforts is the constantly updated hierarchical

catalogue, at the top level the most general(common) categories, such as

“business” are assembled, "”science", "”art" etc., and the elements of the

lowermost level represent the links to separate WWW-pages and server

together with the brief description of their contents.

Guarantees that such catalogue really covers all contents WWW, nobody will

give, however possible(probable) not the completeness and selection of

materials is with interest expiated there, that for the present not under

force to any computer - intelligence of selection.

The subject catalogues give also opportunity of keyword search. However

search this occurs not in contents of WWW-servers, and in their brief

descriptions stored(kept) in the catalogue.

The subject catalogues Internet can be counted literally on fingers, as

their creation and support require(demand) huge expenses. To most known

concern Yahoo, WWW Virtual Library, Galaxy and some other.

Yahoo.

Is most popular in the population Internet the catalogue Yahoo. On the

first page Yahoo, located to the address http://www.yahoo.com, you receive

access to two basic method of work with the catalogue - keyword search and

hierarchical tree of sections.

Having begun descent(release) on sections of the catalogue, you will see,

that each section contains the precisely same field for input of keywords

and button Search, starting search.

Each section can include as transfer included in him(it) and actually links

to pages concerning at once all section, with their brief descriptions.

Instead of travel you can at once get in the necessary place of the

catalogue Yahoo with the help of search. Having entered one or several

keywords divided(shared) by blanks, in a line of search and having pressed

the button Search, you receive the list all in Yahoo, that comprises the

specified keywords. This list will be divided(shared) into two parts -

“categories” and “sites”.

If total of the links returned as a result of search, exceeds 25, the list

of the links will be broken on some parts.

Magellan.

But not seldom happens so, that the list, given out by the machine, is very

great and to see (overlook) it(him) simply not really. An output(exit) from

this situation can become stricter selection of the information brought in

the catalogue. One of most known such systems - catalogue Magellan to the

address: http: // www.mckinley.com

This database contains the items of information on 80 thousand WWW-pages -

that very much not much in comparison with those in millions, which exist

in a network. However if Yahoo as the description of a resource uses one -

two lines of the text, the employees of system Magellan on some of pages

brought in their database, write the small reviews, and also estimate

quality of these information resources on a five-mark scale. Till past

bases of the reviews, Magellan owns as an own automatic index, for search

in which it is necessary to throw the switch under a field of input in a

rule(situation) entire database.

As a rule search represents one or several keywords divided(shared) by

blanks.

Point.

The service, similar by the principles, of firm Point (http: //

www.pointcom.com) in general basic emphasis does(makes) not on search, and

on work with the thematic catalogue.

The service Point is known in a network for that its(her) employees are

constantly engaged оцениванием of network resources and conduct the lists

of those sites, which they consider(count) belong to “ to the best five

percents(interests) WWW ”.

Firm Point Conducts a shared database all “ of five-percentage WWW-pages,

where about everyone it is possible to read the detailed license.

Virtual Library.

The most old subject catalogue WWW is the catalogue Virtual Library:

http: // www.w3.org/hypertext/DataSources/bySubj ect/Overview.html

This system full enough covers a scientific layer WWW - servers of

universities, laboratories and educational institutions.

Russia-On-Line Subject Guide.

For the users in our country the certain interest can represent the

thematic catalogue Russia-On-Line Subject Guide, located to the address

http://www.online.ru/rmain. This catalogue contains rather motley assembly

of the links on foreign sources plus the thematic review of the Russian and

Russian resources WWW.

2.2. Automatic indexes.

It is possible to approach to a problem of search of the information in

Internet and on the other hand. There are programs in which have loaded

some thousand well-known URL-addresses. Being is started on the computer

with access to WWW, this program begins automatically to download from a

network the documents on it URL, and from each new document she(it) takes

all links, contained in it,(him,) and adds them in the base of addresses.

As at the end all WWW the documents are connected among themselves, early

or late such program will bypass all Internet.

Certainly, the program can not understand as or classify that she(it) sees

in a network. The programs of such type refer to as robots. They are

limited to the tax of the statistical information and construction indexes

in the texts of the documents. The database, collected by the robot, -

index - stores (keeps) in it, simply speaking, item of information on that

in what WWW-documents to contain those or other words.

Such the automatically collected index also underlies retrieval systems of

the second sort, which frequently and name - automatic indexes.

The automatic index consists of three parts: the program - robot collected

by this robot of a database and the interface for search in this base, with

which the user works. All these components quite can function without

intervention of the man.

As any classification of materials in such systems are absent, it is

necessary to resort to them only then, when you precisely know keywords

concerning that it is necessary, - we shall tell, a surname of the man or

it is enough some of rare terms from the appropriate area. If to set search

on the a little widespread words, you will have not enough life to bypass

all URL-addresses, received as a result of search, - for example, the index

of system Alta Vista contains of 11 billions words taken from 30 millions

of WWW-pages.

An automatic index of WWW-pages exists much: WebCrawler, Lycos, Excite,

Inktomi, Open Text and others. Some of them (for example, Lycos) represent

more or less successful synthesis of the subject catalogue and automatic

index.

Alta Vista.

Its(her) address http://altavista.digital.com. This system has appeared in

December 1995. She (it) one of largest on volume of indexes from all such

retrieval systems both most powerful and floppy rules of construction of

searches. Alta Vista understands two different languages of searches rather

strongly distinguished from each other. On the first page Alta Vista you

see the form for simple search (Simple Search), and the panel of heading at

the top of page contains the button Advanced Search, having pressed which,

you receive the form for complication of search.

Except for WWW-pages, Alta Vista conducts a separate index for clauses from

more than 14000 conferences Usenet (including hierarchy of groups relcom.

*).

Search Alta Vista: that Alta Vista worked on group of words, only when they

cost(stand) beside, it is necessary to conclude this group in inverted

commas. If it is necessary to exclude from result all documents containing

a certain word, it is necessary to attribute this word with is familiar

“minus”.

The word without any mark works in search precisely the same as also it

with is familiar “plus”.

As against Yahoo, by default Alta Vista searches of entry of the whole

words. The ordered terms should stand in the document separately, instead

of to be a part of other chains of symbols. If you need to find of all

entry of a word, even when it is included into structure of other words,

use a symbol *. The asterisk can stand only at the end of a word, and

prevent giving many (too much) of results, Alta Vista requires(demands),

that the word which is coming to an end on *, should consist not less than

of 3 letters. Moreover, a symbol * allows to find not any termination

(ending) of a word, but only not exceeding length of five symbols and not

containing of capital letters or figures.

Results of search Alta Vista, as well as Yahoo, gives out as the list of

the links on the documents, but instead of the description of each document

near to his(its) heading you will see simply first some lines of his(its)

text. If will be found more than 10 documents, Alta Vista will break their

list on pages till 10 links on everyone. Alta Vista sorts the links so that

on the first place there were “most important” documents with your keywords

at definition of a degree of importance taking into account the following

factors:

Whether the keywords into heading of the documents enter;

Whether these words in the first several lines of the documents contain;

As far as are close to each other in the text the keywords are found out.

Info seek

Info seek, entered in of operation at the end of 19996 years, is somewhat

reminiscent Alta Vista, however volume of the complete texts, surveyed by

him,(it,) of the documents yet does not exceed 30 million. Web-pages. The

address: http://www.infoseek.com. It is rather powerful system having high

speed and idle time in circulation. Opportunities of drawing up of search

almost same, as well as in Alta Vista, but not so rich. At almost complete

preservation of values of marks "«plus", "«minus" and "«inverted commas",

sensitivity to a difference header both lower case letters and opportunity

to limit search to fragments Web- of pages, Info seek yet has no ability to

define(determine) beside the worth terms (there is no operator NEAR), to

limit search by date of updating a source and, main, to truncate

terminations(ending) the key terms.

But the given retrieval system contains weight of facultative functions.,

for example, opportunity concerns to those to define(determine)

quantity(amount) of the links in WWW on concrete page, that is to judge, as

far as she(it) is popular or, on the contrary, to find out, how many links

to external pages contain on the given site, more correctly, how many from

them are reflected in index files Ultra seek. Use of special function Image

seek allows to find in Internet of the image (figures, photo) on the

certain subject. Info seek has also one of the best directories of

resources of a Network.

HotBot

By one of powerful search means in World Wide Web can attribute (relate)

HotBot, containing the items of information on the complete texts 110

million. Pages. The address: http://www.hotbot.com. HotBot belongs to the

newest systems, therefore his (its) profound search gives amazingly ample

opportunities for detailed elaboration of search. It is reached (achieved)

at the expense of use of the multistage menu offering various variants of

drawing up of the search instruction. It is possible to carry out search on

presence in the document one or several terms, search on a separate phase,

search of the concrete person or links to the certain electronic address.

For the greater detailed elaboration of search probably application of

conditions SHOULD (can contain), MUST (should necessarily contain), MUST

NOT (should not contain) in relation to any concepts. Besides HotBot

represents возможность ограничить search by date of creation or last

updating of the document, on geographic a rule(situation) of the server.

Top of service opportunities is the search of the documents containing

certain types of files, for example video. For this purpose it is necessary

only to make a mark in special item of the menu of search.

WebCrawler.

It is one more tool of search such as search-bot (search the robot).

The address: http: // www.webcrawler.com. The search here is very simple.

Enter maximal keywords into a field of search, press Search.

Lycos.

It is the large database that contents of all pages, found by Web.

The address: http://www.lycos.com.

World Wide Web Worm.

You will find this tool of search on

http://www.cs.colorado.edu/home/mcbryan/wwww.html. It is one more extensive

index of sites Web.

In each concrete case it is expedient to use the tool of search. You should

try to carry out(spend) search through one tool and, if you have not

received results, to pass to another. But nevertheless what tool to use?

First of all it is better to take advantage of the thematic catalogue such

as Yahoo, size at them rather small, but the speed is great. If to find the

necessary information it was not possible, it speaks that you are

interested in a too narrow subject, or badly correspond with your subject

the keywords, chosen by you. It does not mean, that the necessary

information in WWW is not present - it will be simple to find her(it) more

difficultly. For its(her) search to you will be reached to take advantage

of more primitive, more automatic and consequently by more universal

systems such as Alta Vista.

2.3. Russian retrieval systems

The retrieval systems of global scale concentrate the basic attention on

English resources of a Network. The task of search of the information on

servers within the limits of the separate countries is carried out with

systems of local character specially adapted for features of concrete

languages. There are similar search means and in Russia. All of them are

united by (with) an opportunity of processing of materials in all Cyrillic

codes*. However on capacity and level of offered service the Russian

retrieval systems considerably differ from each other.

Rambler, "Апорт" and "Яndex" now concern to leading group systems.

Rambler

Among favorites is allocated Rambler (http://www.rambler.ru), becoming with

the first professional domestic retrieval system. This system provides full

text search on 3 million. Pages located on more than 15 thousand Web sites

of Russia and the countries of near foreign countries. Besides Web-servers,

the week archive of news of hierarchy relcom is surveyed also.

Rambler has close to an optimum conclusion of results of search. Even in a

normal form the link on found object inserts the complete information. The

system is designed in such a manner that the same document in the various

coding is shown in the various coding are shown only once, and his(its)

concrete addresses are summarized in the list. it’s a reduces time on

analyze of the received results because of absence of duplication of the

same documents.

The main lack Rambler consists in impossibility to carry out search on the

whole phase or even to specify in searches limiting distance of the

required terms from each other. The casual combination of completely untied

words results in distribution of the links on the documents, are absolutely

not relevant to search.

Апорт

The retrieval system Апорт (http://www.апорт.ru) is supplied with weight of

various functions carrying her(it) number user-friendliest.

One consists of the main advantages Апорт in ample opportunities of drawing

up of search. Besides the traditional operators "both" and “or”, search on

the whole phase, the system is capable to isolate combinations of the terms

located in the text by a number (line) with each other. Апорт offers an

opportunity of machine translation of search with Russian on the English

language and on the contrary. Both Rambler, and Апорт are capable to

allocate the same document in the various coding and to give out the link

to him (it) only of time, listing(transferring) concrete addresses in the

list URL. Unfortunately, thus the items of information on the out-of-date

versions of the same page in time do not leave which are listed

(transferred) as existing, having a difference only in date of updating.

One more lack of this system is not always correct processing of the names

of pages, because of what as a result of search the document without the

name » frequently is underlined «.

Яndex

Retrieval system Яndex (http://www.yandex.ru), where besides servers of

domains «ru< and «su< Яndex индексирует the contents of foreign Russian Web-

sites.

The main distinctive feature of this system is the deep morphological

analysis of the process able terms. The most powerful linguistics allows to

take into account practically all possible (probable) shades of the use of

keywords and to make search maximum precisely. Яndex has the good mechanism

of recognition of one document in the several coding or on mirror servers.

After leading Russian тройкой there are some more search means, among which

“ the Russian machine of search ” (http://search.interrussia.com), "«TELA-

search" (http://tela.dux.ru/) and Russian Internet Search

(http://www.search.ru). While all these servers do not differ neither

breadth of search, nor by comfort, and can be used only as addition to

conducting search means.

The search service in the Russian block Internet, as well as all over the

world, develops promptly. There is no doubt, that in the near future

parameters of existing systems will be raised, the new generations of

search means giving to the users still (even) the large opportunities will

appear.

Страницы: 1, 2, 3



Реклама
В соцсетях
рефераты скачать рефераты скачать рефераты скачать рефераты скачать рефераты скачать рефераты скачать рефераты скачать