Search Engine Optimisation (SEO): Website Crawling

As we have discussed previously, producing search engine results involves several distinct processes, and many factors must be considered to build and maintain a high performing website.

  • Crawling. Can search engines find your pages and read their contents?
  • On-Page SEO. Creating compelling content which answers searcher’s queries and ranks highly for relevant keyphrases
  • Technical SEO. Building your site in a way that maximises natural search performance.
  • Link building and establishing authority. Also known as off-page SEO. Building the authority of your site by linking building and online PR.

Crawling: Can Search Engines Find Your Pages?

Before your website can appear in the search engine results, it must appear in the search engine’s index. The first task of optimising your site should be to establish if any issues prevent your site from being fully indexed. Many sites are constructed in a way that prevents search engines from indexing some or all the pages on the site.

Analysing Current Performance

It is essential to establish the proportion of your site’s pages that are indexed. Do this by comparing the number of pages on your website (as measured by your content management system) with the number appearing in the search engines index. To find the number of pages from your website which Google has indexed, consult the Google Search Console (see below). To get an estimate to the proportion of indexed pages, divide the number of indexed pages by the total number of pages.

It takes time for a search engine to refresh its index, and so if your site is large or often changes, 100% inclusion is unlikely. However, if the number of pages indexed is way below the expect this should be investigated (see below). If your site is not appearing at all, then there could be a few reasons for this:

  • New site that has not yet been crawled.
  • No links from external websites.
  • Site navigation makes it difficult for a robot to crawl it.
  • ‘Crawler directives’ are blocking search engines. E.g., robots.txt file
  • Site penalised by Google for ‘Black Hat’ tactics

Google Search Console and Google Sitemaps

Google has created a suite of tools called Google Search Console (GSC) with the mission of ‘helping you measure your site’s search traffic and performance, fix issues and make your site shine in Google Search results. These tools supplied include:

  • Coverage reports. See the number of pages that Google is indexing.
  • Search performance. Natural search clicks and performance as measure by Google.
  • Website analysis. Google’s analysis of the quality of your website, e.g., usability and page speed.
  • Google sitemaps. When submitted to Google, an XML file will provide them with details of the pages on your site.

Submitting a Google sitemap is crucial to improve your indexing and be a priority soon after a site launch. These will be generated automatically from all decent CMS and eCommerce systems and can be quickly submitted to GSC.

Indexing problems

If your site is not being indexed or has a low percentage of indexed pages, one of the following problems may be present:

Robots.txt file

There are some good reasons why you would not want a search engine to index sections of your site, for example, admin and checkout pages and duplicate content.

Robots.txt files can be found in the root directory of each website (e.g., and tells Google which parts of your site it should and should not crawl. A problem with the robots.txt may cause Google to stop crawling your site.

Inaccessible Content

Your content could be inaccessible to search engine crawlers for several reasons:

  • Search required. If the content is only accessible via a search, the content will not be accessible to the crawler.
  • Non-text content. Do not use non-text media formats (images, video, GIFs, etc.) to display text that you wish to be indexed.
  • Non-HTML menus. Search engines have difficulty reading menus not written using HTML, e.g., JavaScript.
  • Navigation dead ends. Crawlers only find pages linked to other pages.

Server Errors

When crawling the pages on your site, a crawler may encounter errors. Google Search Console’s ‘Crawl Errors’ report lists URLs on which there are server errors and not found errors.  A moved page will lose its ranking unless the site sets a ‘301 redirect’ to direct the page visitor to the new location. A redirect will ensure that the new page is indexed correctly and that its authority is maintained.



Published by

Leave a Reply