World Wide Web Search Engines

This article appeared in the Fall 1995 issue of alphaBYTES,newsletter of the Le Moyne College Library.

At the same time that subject access to the Internet is gettingeasier, it is getting more complicated by the number of ways onecan choose to search it.

There are Internet catalogs that offer subject trees, such asCERN, GNN's Whole Internet Catalog, Library of Congress' Marvel,Yahoo, and Yanoff's List. Subject trees function much like aclassified table of contents. Searchers are presented with alist of broad subject headings subdivided by a few subheadings. Searchers choose a broad topic as a launching pad for browsingthe Internet sites available in that catalog.

Some WWW search tools have search engines which function morelike an automated index. Web search engines are growing innumber and offer a variety of search techniques from the simpleto the complex. Engines have two components. The part of theengine that we SEE is the search form and query box. What we doNOT see are the Web "spiders" that are at work. Web searchengines operate in concert with spiders or software robots thatsearch through hypertext documents and seek out relevantinformation. The spider reports back to the host catalog.

The Web search tools that use a spider to build their catalogstaking them one at a time with the smallest ones first are WWWWorm, WebCrawler, InfoSeek, and Lycos. Yahoo does not qualify asa spider, but it is included because of its friendly searchengine.

Yahoo: The Little Engine That Can
www.yahoo.com

Although Yahoo is a subject tree at heart, it offers a littlesearch engine to help seekers cut to the chase. It operates asan index to its subject headings for its 50,000 Internet sites. It does not operate truly as a spider by reaching out into theInternet to search for an ever expanding list of resources. Instead, people submit requests that Yahoo carry their home pagesthat they have developed. Yahoo is an excellent place to beginbrowsing the Web.

The "Worm"
wwww.cs.colorado.edu/wwww

Falling out of favor is World Wide Web Worm sponsored byUniversity of Colorado. Few campus wide information services areproviding links to it any longer. It is currently found in LeMoyne's Home Page.

The Worm indexes URLs in hypertext (text that is highlighted inhome pages). Consequently, if a site is not cited in a hypertextpage, it will not be known to the Worm. The service contains 3million URL names. It has 4 types of search databases withcitation hypertext and citation addresses (URLs) being thelargest. Three search techniques are provided: two keyword anda slower UNIX regular expression for specialists. Booleanoperators AND and OR may be invoked after choosing one of thefour databases.

WebCrawler
webcrawler.com

Relevancy ranking is WebCrawler's claim to fame. The first itemretrieved in a search will be rated with the top score of 100. The next items are ranked relative to that first site in terms ofhow many times the search terms appear in the document. WebCrawler is somewhat imprecise in its rankings and searchersmust call up retrieved documents based solely on their titles andtheir rankings. The strength of WebCrawler is that contents ofthe documents are searched, but excerpts of contents are notpresented in the initial list of search results. Booleanoperators AND and OR can be invoked on the search screen under"Find Pages" where "ALL" or "ANY" can be selected beforeexecuting the search. AND or OR are not to be used in searchstatement. The trick with this spider is to avoid being toospecific, but still use words that uniquely identify what you arelooking for.

InfoSeek
www.infoseek.com

Getting top ratings of all Internet search tools, InfoSeekprovides quick access to Web pages, Usenet news, newswires, andbusiness, computer, health, and entertainment publications. InfoSeek allows for plain English search statements. In a searchof "what is the meaning of life," the words is, the, of are notsearched. The longer words are searched with equal weight. Search results are provided with relevance rankings, the firstfew words of the document, and the title.

While the plain English approach can lure searchers by itsseeming ease of use, there are plenty of syntax rules that makeInfoSeek a veritable mine field. Proper names must becapitalized and, if there is more than one proper name in thesearch statement, they must be separated by commas. A plus markcan be used in front of a required word or phrase, just as aminus sign can be used for undesired words. For example printer+laser +color fast best cheapest would be an effective search forvaluative things about color laser printers to purchase. python-Monty would be the way to search for the snake and not thecomedy team. Be careful to use no spaces between symbols andwords.

There is no Boolean searching or truncation.

Lycos
www.lycos.com

What's in a name? Lycos is short for Lycosidae, a family ofspiders known for their aggressive hunting. Like its namesakewho runs over the ground in pursuit of its prey, Lycos scours theInternet filling its database, bringing in thousands of documentsdaily. With its 8 million sites or 91% of the Web, Lycos ishands down the largest catalog. The next largest Internetcatalog contains 13%.

Lycos searches URLs for words appearing in document title,headings, subheadings, and the first 20 lines of text. Searchesare conducted through an optional search form or directly in thequery box on the home page. The search form is for fine-tuningsearch strategies. Boolean searching is available as is arelevancy strategy of selecting how close a match you want amongsearch terms. "Strong" matches can save time, space, and moneyby reducing the irrelevant items.

Results are presented conveniently with relevancy score, documentoutline, keywords, and an excerpt or annotation. Searchers canthen make an a more informed choice before retrieving fulldocuments. An optional default number of items will bedisplayed. Select "next 10 hits" at the bottom to see more ofwhat was retrieved.

Happy trails and don't forget to bookmark!

by Inga Barnello