Major Search Engines and Directories in 2001
© 2001 Walt Howe and Hope Tillman
|Google Newsgroup Search||Search||Info|
|WWW Virtual Library||Search||Info|
|Usenet FAQ Archive||Search||Info|
|The Invisible Web||Search||Info|
Each of these Internet search engines lets you find web sites and Internet files that include the words you search for. They are all different and when one does not produce what you are looking for, try another. They all work a little differently, too, so take time to read their help files to improve your searches. For more help, see our Guide to Boolean Searching. See also our Guide to Specialized Search Engines
- Northern Light (http://www.nlsearch.com)
Northern Light is the professional searchers’ search engine offering full text search of “every page out there™” (about 500++ million pages) as well as licensed, for-fee sources from 5500 journals, reference works, and newswires “not found on the web”. Most recently, it has added search of Government documents for a fee, although much of that database is available from the free searching, too. It supportw full Boolean searching. It supports +, -, OR, NOT, AND, and parentheses. Use double-quote marks around terms which should appear consecutively. It is best to use a number of terms, and the search results will provide hits that use most of them. Use OR between search terms to get broader results. Use parentheses to avoid ambiguity in resolving complex Boolean expressions.
Northern Light also categorizes all documents that it indexes as an excellent aid to refining a search. When an initial search is performed, Custom Search Folders are listed on the left side of the screen that can be used to see the results by a particular subject, type, or source. Northern Light has the ability to search by date, by URL, and other refinements, too.
- Google (http://www.google.com)
Google is a new, next generation search engine of about a billion pages from several Stanford PhD students. It has followed Yahoo, Excite, and WhoWhere in moving from a student project to a commercial site. Its relevance ranking uses two factors not generally included in search engine rankings: number of links to the page from elsewhere and the “importance” of the pages that link to it. Thus, if Yahoo links to it, it is important, and will rank higher in the list than a link from someone’s “unimportant” personal page. Other ranking factors are the number of hits on the search words in the title and the text and the proximity of search term to each other. By default, all words are ANDed, but you can now add OR between words. A minus sign is used as a NOT. Common words (stopwords) are not included in the search unless preceded by a +. Stemming is not supported; a search for evaluating will not find the word evaluation. They recommend searching for only a few words, not many, and let the relevance ranking work for you. It works suprprisingly well!
Google has just added indexing of .pdf files, which most search engines cannot reach, due to their special formatting. This is a major step forward! Try it!
- Fast (http://www.alltheweb.com)
Fast, from Norway, has jumped to become the largest search engine at 300 million pages. It supports full Boolean expressions, although they are not mentioned in the documentation. It also supports +, and – symbols, quotes and parentheses, and a form to support all the words, any of the words, exact words, and language, domain, title, text, and link restrained searches.
This is another large Internet search engine with very powerful advanced features about the same size as Northern Light, but lacking the latter’s special collections. It searches over 130 million web pages. Because of its size, your search should be carefully crafted, preferably in Boolean terms, or you may get too many hits to look through. The basic search can use + and – prefixes to ensure or exclude terms from the search, and quotes to ensure adjacency (see below).
Advanced Search is its most powerful feature, and allows AND, OR,
AND NOT, and NEAR (within 10 words) as Boolean expressions and limiting
by dates and other fields. Use of parentheses are encouraged to group expressions. It has many unique features, including search by specific language, rudimentary language translation, and sophisticated techniques for refining searches and ranking results. Use its field searches to search by link, url, domain, etc. It has also added a natural language search capability by licensing AskJeeves technology.
- HotBot (http://www.hotbot.com)
HotBot, powered by Inktomi, seems to vary in size from 40 to 100 million pages, depending on how many servers are running. It supports full Boolean searching and recognizes AND, OR, NOT, (or the equivalent symbols: &, |, !), parentheses, double quotes, + and -, and other advanced features. It supports * and ? as wild cards, too. It also supports a system of modifying the first round search to refine it. HotBot and its parent Wired Digital are under agreement to be acquired by Lycos.
HotBot’s major strength is the “More Search Options” forms that support searching. You can craft complex searches without knowing Boolean expressions. This, after Google, is the power engine for the novice.
- Lycos (http://www.lycos.com)
Lycos is a medium size search engine, searching over 50 million web pages as well as gopher and ftp sites. It has some very sophisticated features for controlling proximity and sequencing of search terms. It searches for graphics or sound files as separate choices. The basic search ORs all terms, but gives preference to results with the most hits. The Custom Search allows you to AND all terms, OR all terms, or return at least a selected number of terms. It also allows you to limit or include lower scoring returns.
- Google newsgroup search (http://groups.google.com/)
Formerly Deja.com, searches past and present newsgroup articles. It is particularly valuable when searched carefully, because newsgroup articles themselves include information and point to many resources that might not be found through web searches. It is one of the best ways to find a brand new web site that other engines have not yet indexed.
- Yahoo (http://www.yahoo.com)
Yahoo is the biggest of the subject-matter organized directories and has been imitated all over, particularly by LookSmart and the Open Directory Project. It is
very useful to find good collections of resources for a topic. It has an
advanced search mode, too
Why we don’t list many Metasearch engines.
The Metasearch engines, which combine searches of a number of the individual engines, have distinct limitations. They generally use the most basic level searches of each of the major engines, and do not handle complex search constructs well. They generally do not search Northern Light at all. If you are looking for the one of a kind item on the nets that can be uniquely defined with a few choice words, they may serve well. If you are looking for the best of many items, they are not likely to be appropriate. They are getting better, though, and as they gain the ability to use the more sophisticated search modes of each platform, we will revisit them. Progress is being made. For the best of them, take a look at MetaCrawler or DogPile or Inference Find.
Some Specialized Collections.
- Argus Clearinghouse for
Subject-Oriented Internet Resource Guides (formerly U of Michigan
As opposed to the general subject guides like Yahoo, the Clearinghouse is professionally
oriented and assembled by either librarians or proven experts in the subject
fields. There is an attempt to evaluate quality in these resources that
cannot be brought to the search engines or the consumer oriented collections
like Yahoo and the search engine subject guides. Many of the newer guides
have been rated by objective criteria and older guides are retired, if they
are not updated in a year. You can either approach the guides by subject or
by a full-text
search of the guides.
- The World
Wide Web Virtual Library.
This venerable effort to organize the web was started in the CERN high
energy physics labs in Switzerland, the birthplace of the web itself. Like
the Clearinghouse, these contain the best efforts of subject experts in each
field. Unlike the Clearinghouse, there has been little centralized
accountability to make sure that the resource listings are kept up to
date. The quality and timeliness of the listings varies widely today.
Visit the Virtual Library, but take a critical look at the quality and
timeliness. Many of the best subject resource listings also appear in the
Clearinghouse, but don’t assume all the quality lists are duplicated.
- Stumpers Archive.
The Stumpers Archive contains the full text of questions and answers posted to the Stumpers-L list since 1993. It is searchable by keywords in a gopher index. It doesn’t look pretty, but it is a treasure of answers and approaches to solving difficult reference questions, many of which pop up again and again. See the Stumpers-L web page, which includes a FAQ for participating in Stumpers-L and instructions for subscribing and unsubscribing.
- Usenet FAQ Archives.
Usenet FAQs are compiled by the experts who support specific newsgroups.
Since the same questions occur again and again as new people start reading
newsgroups, these Frequently Asked Questions files are developed to provide
a ready source of answers.
They are generally authoritative, subjected to the review of many participants
in a subject field, and frequently updated.
- The Invisible Web.
- Great portions of the web are not accessible by the major search engines, but may be found in other ways. See our guide for searching the Invisible W