Skip to main content
Search engine friendly

How the search engines work

Anyone who uses search engines knows the frustration of typing in a phrase only to find the results return pages about a similar – but unrelated – items. 

The companies running the search engines, themselves, are also aware of this and in order to make their results more accurate, they are turning to increasingly sophisticated information retrieval techniques. 

Gone are the days when improving your ranking on the search engines was a simple case of repeating a number of keywords and phrases on your pages. In the past, to achieve page one ranking, all you needed to do was to have more keywords on your pages than any of your competitors. Now the search engines regard this as spamming. 

Google recommends that web pages should be written to be easily read by humans, as opposed to search engines. (Quality guidelines: www.google.com/support/webmasters )

Density

Information Retrieval and the Web

Information retrieval uses a mathematical approach to determine the weighting of specific phrases in a website. 

Once the weighting is calculated, this can be compared with other websites to determine which is the most relevant. This is a far more accurate method of ranking web pages because the meaning, or semantics, of the page text is captured. This makes for a system which is far less open to abuse than simply weighing up the density of relevant key words in the text. 

Calculating the weighting of given keywords in a website varies with each search engine, but the methods are basically the same.

Here is the Five Step Process Behind Search Engine Information Retrieval.

Linearisation

1: Linearisation

The average webpage contains more than the text you see. 

There is also code embedded in the content which tells the browser your visitors are using how to display the page. The first thing a search engine does, when it reads a page on your site, is remove this code, a process which is called linearisation. 

How linearisation affects you: The more code you have on each web page, the more difficult it is for the search engine to perform linearisation with any meaningful result. 

For example, if your page displays tables which are defined in html, that is, with the definition embedded in the page content, the search engine will remove the code defining the table. It will then read just the text. 

Some search engines may read the text column by column others may read row by row. In other words, the search engine may not read the text in your table the way you intended. The best way to get around this is to use cascading style sheets. These allow you to put much of the formatting information in a separate document leaving just a handful of codes in your content. This, in turn, makes the meaning of each page clearer to the search engines.

Removal of stop words

2: Removal of 'stop words'

Once the search engine has performed linearisation, the next step it takes is to remove stop words from the text. 

These are words which appear often such as conjunctions and pronouns… words like “if”, “but”, “and” or “to”. 

How the removal of stop words affects you: If you are using keyword density techniques to optimise your site, that is, if you have typed in your key phrase again and again, there is a very real danger that the search engine will see it as a stop word, too, and remove it. This would mean that when your target audience searched for those phrases or words on the web, your site would be invisible to them.

Lexicographical tree

3: Local context analysis and the Lexicographical Tree

Next, the search engine aims to establish the context of the subjects within the page. 

Every sentence has a subject and a predicate. The subject is usually at the beginning of the sentence and refers to what the sentence is about. The predicate is the rest of the sentence that gives information about the subject. Within the predicate is usually a verb and an object. The object is normally the thing that is affected by the verb. 

Local context analysis attempts to determine the subject, verb and object of each sentence in each paragraph or page. 

Using each subject found, it collects all the associated objects and builds a two tier hierarchy. Then for each object, it collects all the associated verbs, synonyms and other words and builds the third tier of information. 

This three tier hierarchy of information is known as a 'lexicographical tree'. 

For any given subject, the richer your description, the bigger the tree will be. Local context analysis will result in a relevancy score for each subject and object based on the size of the tree. 

This relevancy score will be used later to determine the overall weighting of your web page. How building the lexicographical tree affects you: If the sentences in your text are not properly constructed the search engine may not pick up on the interrelationship of the nouns and verbs in your text and this may cause it to catalogue your site incorrectly. 

This is why Google advises the use of good grammar and encourages the use of readable text, because the search engine picks out subjects, objects and verbs from each sentence.

Latent semantic indexing

4: Latent Semantic Indexing

Look for other pages on the website that appear similar or have equally high relevancy scores for the same keywords. 

Having established the relevancy of the subjects and objects in each web page, the next step is to look for other pages on the website that appear similar or have equally high relevancy scores for the same keywords. 

Part of this process investigates the use of synonyms to test how often the same or semantically similar words are used. This builds an index of semantics as the use of synonyms will help give the subject or object more meaning. 

This can result in a higher ranking for a given page that may not even contain an exact match for the search keywords in question. This is because the derived latent semantics discovered on this page may be more relevant. 

How Latent Semantic Indexing affects you: The better your description of the subjects and objects within each sentence, the more accurately the search engine can pin down what your site is about and hence determine the most relevant page. This will vastly increase the number of appropriate hits you will receive from the visitors you aim to attract. 

However, while varying your vocabulary is good, beware of using words in an unusual context, even if it is grammatically correct to do so, as it may skew your results. It is often useful to relate an object being written about to senses or visual images – especially in direct copy writing – but pick your words carefully. 

For example, a page describing a children’s ABC poster recently stated that ordering through the company’s online shop was “a piece of cake”. Shortly afterwards the site statistics started showing visitors who had been searching for information about cake decorating. An alternative way of putting it, without losing the informal tone or diluting the relevancy of the page, might have been “ordering is as easy as ABC”.

Term vector analysis

5: Term Vector Analysis

Term vector analysis is a mathematical method of determining the relevancy of multiple terms or keywords.

Having carried out Local Context Analysis and Latent Semantic Indexing, the search engine uses a mathematical algorithm on the results, called Term Vector Analysis, to give page a score for the total relevance, or weighting, to the search query. 

This is performed by putting each keyword on an axis on a graph, and marking the relevancy score for each of those keywords for a given page. The result will be a vector having an angle and a magnitude. This then gives a method of comparing different pages for given combinations of keywords. The vector with the highest magnitude and the closest angle to search query will have the largest weighting. 

See below for an example of the Term Vector Analysis Graph for the search terms "torches for cars": Compare the weighting for 'SITE A', with a relevancy score of 0.7 for torches and 0.2 for cars, with 'SITE B', with a relevancy score of 0.6 for torches and 0.5 for cars. The graph clearly shows that that 'SITE B' is the closer match to the ideal weighting and therefore 'SITE B' will be ranked higher by the search engines. 

How Term Vector Analysis affects you: 

You need to build a high relevancy score for the important keywords. The calculated weighting for combinations of your important keywords on your pages will consequently be higher. This will result in a greater chance of your web page being found on the search engines.

Marketing Workshops

Try our marketing workshops for FREE

The first FREE introductory workshop will cover the Business Posture section of the Strategy Builder training programme. 

This 45 minute call will cover the following areas: 

  • Splitting your marketing activities into customer acquisition and customer nurturing marketing strategies 
  • Add clarity to your objectives by asserting a well defined USP 
  • Generate a business posture by presenting value and results combined with your passion to deliver 

The complete workshop programme includes Marketing Analysis, Content Marketing, Information Architecture, Improving Communication and Measuring Success. These workshops can be arranged as 45 minute telephone / Zoom calls to be held once a week (a time and day that fits). Those companies who wish to take this further can arrange one-2-one training or join an existing workshop group held on Wednesdays or Thursdays.

I'm interested in
I am*