Google Index Reports: Our Comprehensive Guide to Understanding site errors

by | Dec 11, 2020

Not every page on your website is suited for Google’s index. As a marketer, it is your job to make sure that we communicate your wishes to Google or else Google is going to decide for itself,  and that often leaves many an SEO smoking at the ears.  

Thankfully by using the index coverage reports in Google Search Console, we can get a clearer picture of how Googlebot is interpreting your site. 

There are a lot of different ways that Googlebot can interpret and subsequently classify a page. With over 25 different categories, it gets confusing trying to determine what they all mean.

Here is your complete guide to understanding your Google Index Report.

Valid Pages

Let’s start with the easy ones. Pages with a valid status have been indexed by Google and are good to go. There are two potential categories here:

  • Submitted and indexed: You submitted the URL for indexing, and it was indexed.
  • Indexed, not submitted in sitemap: The URL was discovered by Google and indexed. We recommend submitting all important URLs using a sitemap.

Potential Google Index Errors

  • Server error (5xx): 
  • Redirect error: Google experienced a redirect error of one of the following types: A redirect chain that was too long; a redirect loop; a redirect URL that eventually exceeded the max URL length; there was a bad or empty URL in the redirect chain. 
  • Submitted URL blocked by robots.txt: 
  • Submitted URL marked ‘noindex’: 
  • Submitted URL seems to be a Soft 404: 
  • Submitted URL returns unauthorized request (401): 
  • Submitted URL not found (404): 
  • Submitted URL has crawl issue: 

Warning Status Pages

Pages with a warning status might require your attention, and may or may not have been indexed by Google, according to the specific result.

Indexed, though blocked by robots.txt: 

The page was indexed, despite being blocked by robots.txt. This is marked as a warning because we’re not sure if you intended to block the page from search results. If you do want to block this page, robots.txt is not the correct mechanism to avoid being indexed. 

To avoid being indexed you should either use ‘noindex’ or prohibit anonymous access to the page using auth. You can use the robots.txt tester to determine which rule is blocking this page. Because of the robots.txt, any snippet shown for the page will probably be sub-optimal.

If you do not want to block this page, update your robots.txt file to unblock your page.

Excluded Pages

These pages are typically not indexed, and we think that is appropriate. These pages are either duplicates of indexed pages, or blocked from indexing by some mechanism on your site, or otherwise not indexed for a reason that we think is not an error.

  • Excluded by the ‘noindex’ tag: When Google tried to index the page it encountered a ‘noindex’ directive and therefore did not index it. 

VERDICT: No action required, unless the page is meant to be indexed, then you will need to remove the ’noindex’ directive.

  • Blocked by page removal tool: The page is currently blocked by a URL removal request. Keep in mind that removal requests are only good for about 90 days after the removal date. After that, Google may revisit and index that page even if you do not submit an index request. 

VERDICT: Ensure this URL was removed intentionally and if it is a permanent removal, then implement the ’noindex’ tag.

  • Blocked by robots.txt: This page was blocked to Googlebot with a robots.txt file. You should double-check that these URLs are meant to be blocked. If you’re diligent with your exclusions, then in all likelihood it’s appropriate to find links on this list. 

VERDICT: Not likely to require any action

  • Blocked due to unauthorized request (401): Googlebot is blocked from accessing this URL as the page might be password protected

VERDICT:  Leave as is if intended to block Googlebot otherwise if not, the webmaster would need to remove authorization requirements for that page to allow Googlebot access to crawl it. That would be done at the site level outside of the search console.

  • Crawl anomaly: This means Google experienced an unexpected issue when trying to crawl these URLs. These are quite often a 4XX or 5XX error.

VERDICT: These links should be reviewed and updated as needed

  • Crawled – currently not indexed: The page was crawled by Googlebot, but wasn’t submitted to its index. This can be attributed to a number of factors and may still get indexed in the future. This can be the first sign of link bloat.

VERDICT: No need to resubmit this URL for crawling

  • Discovered – currently not indexed: The page was found by Google, but not crawled yet. Typically, Google tried to crawl the URL but the site was overloaded; therefore Google had to reschedule the crawl. 

VERDICT: No action required. These URLs will get crawled soon

  • Alternate page with proper canonical tag: This page is a duplicate of a page that Google recognizes as canonical. This is an example of a good exclusion. This tells us that Google is honouring our canonical link. 

VERDICT: This page correctly points to the canonical page, so there is nothing for you to do.

  • Duplicate without user-selected canonical: It appears that the page has duplicates, none of which is marked as the canonical. What this tells us is that the page is missing a canonical link and Google is deciding what to do with this page. This can often lead to good pages being skipped for indexing.

VERDICT: These pages should be reviewed and updated with a proper canonical link.

  • Duplicate, Google chose different canonical than user: This page is marked as canonical for a set of pages, but Google thinks another URL makes a better canonical. This is not always in the best interest of the webmaster. Google will often confuse closely-related products as the same. 

VERDICT: Review your canonical links and potential omissions from Google’s index. You may need to rewrite you copy to make it more distinct for that page

  • Not found (404): This URL was submitted to Google to be crawled, but was not available when Googlebot arrived to crawl it.

VERDICT: Review these links and determine if an update or redirect is necessary.

  • Page removed because of legal complaint: The page was removed from the index because of a legal complaint.

VERDICT: The webmaster needs to identify the origins of the complaint, which could be anything from copyright infringement to explicit content or other factors outlined in Google’s policies. Then they will need to “create a request” after the issue has been resolved and Google will review and verify any changes. If the problem has been fixed, they may include the page in the index again and if not, it will likely remain excluded. The page might remain excluded even after fixing the problem. Google will have the final say.

  • Page with redirect: Google crawled a URL that redirects to a different destination. 

VERDICT: No action required

  • Soft 404: 

VERDICT: These links should be updated or redirected

  • Duplicate, submitted URL not selected as canonical: 

VERDICT: Because it is a duplicate, and Google thinks that another URL is a better candidate for canonical, Google did not index this URL. Instead, Google indexed the canonical that they selected. The user would need to specify the canonical to resolve this error.

Understanding Google’s language is the first step in ensuring that your links are being properly indexed.

We know that Google Index Reports can appear confusing and hard to sort through, but use this guide to assist you in decoding the sometimes confusing link errors and remember that Google is built with it’s own indexing system.

Understand where you can make changes so that you’re telling Google Index exactly what you want instead of letting it decide. 

Once you’ve properly indexed your links see how you can interpret your page speed insights with our free page speed insight tool. 

As Per That Last Article…