Spiders and Search Engines
Search engines are indispensable sources of information retrieval. These online tools enable us to find information on the Web—this being a massive wealth of information.
However, not having the proper tools to find such information would hinder the many advantages offered by the Web. By searching on search engines like Google or Bing, we can find the knowledge we require in just a few minutes or even split seconds.
How Search Engines Return Search Results
When users search on a particular search engine they return with a particular set of results. These results are displayed in the form of links, which are in turn ordered (or listed) from top to bottom. The results are ordered as to what the search engine thinks are the best matches for that particular search. Search engines like Google create results for every search that is performed by going through all the findable pages on the Web. However, search engines do not go through the whole Web when a search occurs. Instead, they return results based on the websites stored on their infrastructure. The infrastructure is built in such a way to identify relevant webpages. This is done to reduce the amount of time, meanwhile returning results that are more pertinent to the user’s search. Another good aspect to note is that when a user performs a particular search, the query is processed by the closest data centre to the user location. On receiving the search, the query is run on different computers in this data centre that will be working simultaneously to find the best match of webpages. This is done to increase performance and reduce the amount of time to return the search results back to the user. In fact, this process takes less than half a second!
Finding Websites
Search engines crawl and find websites by having a special type of computer program referred to as search engine spiders or robots. These search engine spiders 'crawl' through a website by going through all the hyperlinks of a website. When search engines find a link pointing to a website they may decide to visit that website. During the crawl process, spiders will create a copy of the visited page and store it on their infrastructure. Spiders may additionally decide to continue to crawl other webpages residing on that website. Spiders will find pages through the website’s navigation and content. The links between webpages create a connection between other webpages and this is how spiders discover new webpages. Hence, a page which is not linked from another webpage on the site will not be found by search engines. Consequently, the webpage will not be able to rank in search results for a particular search query.
How Search Engines Rank Search Results
To keep it simple, search engines during the crawling process copy pages which are then stored, creating a copy of the Web search engines can return results from. Similar to the index at the back of the book, search engines create an index which contains which words can be found within an online webpage. For instance, for a search query containing the words ‘malta hotels’ the search engines will go through their index which will point (or direct) them to the documents that contain those words. The index may show
that webpage 1 has ‘malta’ only residing in the content. While webpage 2 may be considered more relevant since it has both ‘malta’ and ‘hotels’ within the content of the page. In this process links pointing to that webpage and the text used in the links will also be considered to find relevant documents for that search. After this process the search engines now have a set of results made up of webpages and other online documents that pertain to the user’s search.
Now search engines must figure out which documents are the most relevant and order them from top to bottom, having the most relevant documents at the top of the first search results page. Search engines such as Google take into consideration about 200 ranking signals. An example of these signals includes the amount of links, the authority of the ranking domain, the relevancy of the content and many more.
Search Engines are powerful and handy tools that allow us to access virtual libraries and other massive pools of information from the comforts of our own home.
Conrad Bugeja is a Search Optimisation Consultant and Pay-Per-Click Consultant at Alert eBusiness Internet Marketing Division - www.alertemarketing.com