Search engines: a beginners guide |
| 1991: the World Wide Web developed by Tim Berners-Lee (CERN), was born! 1993: Mosaic, the first graphical browser (by NCSA - National Center of Supercomputing Applications) was born! 1994 : Netscape Navigator was born... Now, just 7 years later, there are millions of web pages on the net...millions of information, and often searching for something is a problem. For this reason time ago someone developed search engines. A search engine is a piece of software to search information on the net by means of keywords. Well, first of all you have to know that almost all search engines (unless some of them) search information on the web. But the Internet is not only the web, so when you use a search engine you can't find information contained inside some listserv databases or inside of the gopherspace for example. So remember: there are several places where you can find information, such as listserv databases, the gopherspace, newsgroups, archie servers, wais and others. But inside of these places, you can't retrieve information there by means of a search engine. Search engines search only inside of the web. How do search engines work? Search engines (also called robots, crawlers, worms or spiders) walk on the web, read HTML pages, analyze them, extract 'keywords', update their databases with those keywords and answer to users requests. How can they walk on the web? Well, they simply follow links contained inside of encountered pages. So - following link by link - they can read all the web! But web pages may not be picked up if they are not linked by other pages. When a search engine reads a web page, it analyze that page, searching for keywords. Then it follows that page's links. So it reaches another web page, analyze it and follows its links and so on. What are keywords? Keywords are all words contained inside of a web page that search engines consider 'important'. This stage is called 'indexing'. In fact these important words are indexes pointing to that page. Some search engines register whole web pages ('full text' search engines), so almost all words contained inside of a particular page are indexes. However some words - called 'stopwords' - are ignored by search engines. Such words are articles or prepositions and other words or symbols that are without meaning for search engines. Let's see this query: the nose of the cat Well, when you type the above query on a search engine's form and then press the submit button, first of all the search engine transform your query. So: nose cat in fact it thinks that some words are not important for you (in this case 'the' and 'of'). For this reason this query will return all web pages containing the word 'nose' or the word 'cat'. In other words you will get a lot of pages and a lot of useless information. However some search engines will return only documents containing both words. In fact each search engines follows its own rules. But, what happen if you want a book's title exactly as written? Suppose you want to find all web pages containing the book's title: the nose of the cat how can you force search engines to consider articles and prepositions too? Well you could put your query within double quotes. So when you write "the nose of the cat" you will get only document containing the exact phrase. Now you know some words are ignored by search engines. Yes, but what words are important for them? Well, usually search engines follow some rules:
So when you submit your query, think about the web page's title. You should 'guess' the words contained in the web page's title that you would like to find! Besides you should use boolean operator. Boolean operator are AND OR and NOT. For example, suppose you are searching for 'red cars'. Well, if you type so: red cars you could find an huge list of links to documents found by the search engine. In fact you could get documents containing the word red OR the word cars. But you want both of them: red AND cars! So you have to use the boolean operator AND. So: red AND cars And what if you want to find 'red cars' but you do NOT want to find 'ferrari'? Well, you have to use the boolean operator NOT. So: red AND cars NOT ferrari However these are just examples, and you should know your favorites search engine's rules. So when you search for something, you should follow these steps:
To know more about rules of some important search engines, look at the following wowarea's page: help for some search engines. |
Index Home Back About Contact us!
Copyright (c) 1998-2006 Wowarea