Search Engine Crawling

What is Search Engine Crawling?

Search engines crawl websites to find out what they contain, so that they can provide relevant results in response to users’ queries. Crawlers crawl the web looking for new and updated information. Once the crawler finds something interesting, it sends back a list of URLs to the server. These URL’s point to different pages on your website.

How to Help Search Engines Crawl Your Website?

There are two main ways to tell Google how to crawl and index your pages. One option is through the robots.txt file — a text file that gives the crawlers instructions on how to crawl the page and what to index. The second method is to use canonical and noindex tags.

Canonicalization is finds its use on domains that have a large number of pages with similar content. eCommerce stores, for example, may have several product pages that display the variants of a single item. As such, most of the content is likely to be identical, causing duplicate content issues. By implementing canonicalization, you can avoid this and direct search engine bots to the canonical page – the one you want crawled, indexed, and appearing in Google search results.

In simple terms, a canonical tag tells crawlers they should disregard the content of a URL, as it is a variation of the canonical one. Keep in mind that search engines see each URL that leads to a page on your website as different (for example: https://www.yourwebsite.com and http://www.yourwebsite.com). 

Hence, the use of canonical tags isn’t limited to product and similar pages – you can (and should) implement it on each page.

Site Navigation

Search engine crawlers reach each page on your website through links, so if your website structure is solid, they should be able to understand your site hierarchy. In addition, using internal links to connect different pages and their content ensures they are all visible to the crawlers by providing a path of links they can follow.

Including external links to domains with high authority is a good practice, as long as you are linking to a relevant page that provides additional value to the reader, since can help crawlers understand what the page is about.

Sitemaps and Why to Use Them

A sitemap is a list of all the pages on your website. It helps search engines find new content on your site easier, especially if it’s not linked from another page. If you’re using WordPress, you can set up a free plugin called Yoast SEO which will generate a sitemap automatically.

Using both an HTML and XML sitemap can help crawlers understand all the content on your pages and crawl them more efficiently.

What is the Difference Between Crawling and Indexing?

Crawling is a process in which bots visit each URL on your site and check to see whether it contains anything that matches what Google has indexed before. This process takes place every day. Indexing means that the bots index the content on your pages so they appear on in Google search results. This happens at a slower pace than crawling but still happens regularly.

The Crawl Budget

Your crawl budget is the approximate number of pages Google’s crawlers will crawl on your website in a given period – a crawl frequency. You can check your current daily crawl budget in Google Search Console.  Although this is rarely an issue for small sites, websites with thousands of pages may want to make sure Google doesn’t spend too much time crawling less important ones, instead of discovering new content and any updates on their top pages. This can be done by implementing canonical or noindex tags on pages that you either don’t want Google to crawl or index.

Learn On-page SEO

URLs

What Are URLs? URL stands for Uniform Resource Locator and it represents the address of a given unique resource on the Web. A URL is a...

read more

Robots.txt

What Is Robots.txt? Robots.txt is a text file with instructions that search engine crawlers can read. These instructions define which areas of a...

read more

Learn Technical SEO

Broken redirects

Half of all internet users will encounter a 404 error page at some point in their life, whether it's because they've clicked on a broken link, typed...

read more

Learn off-page SEO

No Results Found

The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.