Having your website crawled is an important step in the planning process, but when you have many pages to crawl, optimizing your website will ensure that crawl bots single out your website as important, keep your website at the top of their crawl list and ultimately make it more searchable.

What is crawl rate? 

A crawl budget is the number of pages a bot will crawl on a website within a time period. 

Crawl rate is determined by two factors: 

  • Crawl Capacity Limit: How much time a bot will spend crawling your site without overwhelming your server. The capacity can be affected by a few things: The health of your site, your personal settings, and the hosts’ crawling limits. The quicker your site responds the more crawlers will be used to crawl your site, the slower your connection, the less crawlers will crawl. Website owners also have the ability to control how much they would like to decrease crawling on their site (increasing is an option, however not guaranteed).
  • Crawl Demand/Crawl Scheduling: Which URLs are worth recrawling the most, based on popularity and how often the site is being updated. 

The more bots visit your site, the quicker they will index your pages, cutting your time on future crawl rate optimization efforts. Site mapping also helps with indexing and can make it easier for bots to understand and index your website better. 

To optimize your maximum crawling efforts you will want to manage your URLs so that they are clear to the bots.  

The best way to optimize your crawl rate is to manage your URLs 

Users have the ability to tell the search engines which pages to crawl and which pages they can skip. Managing URLs will help these bots and spiders understand which pages on your website are important for indexing. 

If your site is crawled with too many pages that cannot be indexed (ie blog author pages, blog category pages, pages with 404 errors), the search engine might decide that your website isn’t worth crawling regularly. Managing your URLs can assist Google in understanding and indexing your site better. 

Eliminate duplicate content 

Duplicate content is content that has been duplicated under two different URLs that is showing up twice (or more) in the search results. Consolidate duplicate content, so that crawlers are focusing on crawling unique content rather than duplicate URLs.  

Block URLs you don’t want indexed with robots.txt

Things like scrolling pages of information, and duplicate pages may be useful for the user but aren’t always pages you want to show up in the search results. Exclude any pages that are not important by blocking them via the robots.txt file and sending a signal to Google to not crawl those pages. 

Use a 404 or 401 for permanently removed pages 

If you have permanently deleted a page, a 404 or 401 will send a signal to Google to not crawl that URL again. This is important in stretching crawl budgets because it sends a direct signal to Google to not crawl those URLs. 

Update your sitemaps frequently

A sitemap is a great tool to help search engines crawl, understand, and index your website and understand where internal links on your site lead. Google reads this frequently so the more accurate it is and the more it is updated, the better. This way Google can understand your site quickly. No time wasted in guessing. 

Site speed matters

We say it often, but site speed matters. If Google can load and and render your site more quickly they may be able to read your content more efficiently. Ensure that your high quality content pages are optimized for speed (and mobile) first, as Google favours high quality content pages. 

Avoid long redirect chains and loops

Too many redirects discourage Google bots from continuing to crawl your site. Google may give up if it encounters too many redirects and this can also slow down your page loading time. 

So where exactly should you start? Google offers a free tool that will tell you your website’s crawling history. The Crawl Stats Report will show where Google has had issues when crawling your site, where you can then address those issues. 

Crawl budget is still an important part of your SEO strategy and still has the capability to enhance your overall performance. One of the building bricks to the SEO castle. 

Want more SEO insights, strategy and updates? Head over to our blog for content updated weekly.