While things like poor content quality, duplicate content, and blocked pages due to technical issues will require immediate attention, it may also be that the page was blocked from indexing for a good reason, and you don’t need to do anything at all.
Your first step in understanding why your page isn’t being indexed is to explore your Google Search Console report. Search console warnings can help you understand why certain pages aren’t appearing in search results and what steps, if any, you should take to fix the issue.
So, let’s take a deep dive into why pages aren’t indexed and what common search console warnings mean. Then we’ll explore the steps you’ll need to take to ensure that your pages are indexed properly and which warnings may not require any action at all.
How Search Engines index Web Pages
Let’s start with a quick review of the basics. Before Google can index your pages, it uses automated software, commonly known as Googlebot, to crawl your web pages and gather information about them.
The crawler reads the content of the page and follows any links it finds. The process is repeated for each link it follows, as well as for any page that is submitted for indexing, allowing Google to build an index of web pages across the internet.
When deciding how to index a page, Google’s algorithms analyze the relevancy and quality of each page, taking into account things like content quality, the page’s popularity, schema markup, and the value of any internal, outbound, or incoming links.
When a user performs a search, Google’s algorithm refers to this index to return results based on how well the page matches the user’s search query. Pages that are deemed the most relevant are listed first in SERPs, followed by less relevant pages in descending order.
Why Some Pages Shouldn’t Be Indexed
When confronted by a long list of search console warnings, it’s easy to get overwhelmed. But it’s important to remember that some pages shouldn’t be indexed, and it may be ok for some of those warnings to be there.
For example, duplicate or alternate pages shouldn’t be indexed. A non-indexed page that’s marked as duplicate likely indicates that Google has found and indexed the correct canonical page and added it to the index.
If you’re concerned, you can use the URL Inspection tool to verify that the correct canonical has been indexed. If everything looks good, it’s ok for these warnings to be there, and no action is required.
Another example is when a page requires a login because it isn’t intended for public viewing, such as shopping cart or account pages that contain sensitive information. In some cases, the page has been intentionally blocked from indexing with a “noindex” tag for a specific reason, such as maximizing the crawl budget on very large websites.
When the page has been blocked from indexing for a valid reason, it’s ok for the warning to remain in your Index Coverage Report, and no further action is necessary.
Common Causes of Indexing Issues
Some of the most common causes of indexing issues are duplicate content without a proper canonical tag, blocked page access, incorrect robots.txt file, poorly implemented redirects, and rendering issues related to Javascript.
In some cases, Google simply doesn’t know that the page exists. This could be because it’s new, it hasn’t been added to the sitemap, or Googlebot simply hasn’t come across a link to the page. Keep in mind that it can take weeks for new pages to be crawled, even when you submit a crawl request.
Google may also choose not to index poorly optimized content or thin content that doesn’t contain enough useful information. Ensuring that your pages cover the topic thoroughly, are properly optimized, load correctly, and are accessible is the key to avoiding indexing issues.
We’ll explore all of this in more detail below, but first, let’s dig into the basics of how to navigate your Search Console Dashboard and understand your Index Coverage Report.
Navigating Your Google Search Console Dashboard
Your Google Search Console Dashboard might seem a little overwhelming at first, so here’s a quick breakdown of what the different sections mean and how to utilize them.
- Overview Report: The Overview provides a general idea of your website’s performance. This is where you will find data about your total clicks, impressions, click-through rate, and average positioning. Utilize this report when you want to understand how often your site is appearing in searches, which pages are receiving the most traffic, and which queries are bringing in the most clicks.
- Queries Report: This report showcases the exact queries searchers are using to find your website and where your site is ranking for each query. It will tell you which queries are bringing in the most impressions and clicks and which queries have the highest click-thru rate. You can utilize this report to help you identify which keywords to target in your SEO efforts.
- Pages Report: This report provides detailed information about individual webpages and how they’re performing in terms of clicks, impressions, and click-thru rate, as well as rankings by keyword and query. Utilize this report to help you identify which pages are performing well and where to focus your optimization efforts.
- Links Report: The Links report shows you how many external and internal links are pointing to various pages of your site and where they’re coming from. Use it to help you locate broken links, which can be detrimental to SEO and a good user experience.
Utilizing the Page Indexing Report is the fastest way to get an overview of which pages on your website have and have not been indexed by Google. To find it, locate the “Indexing” drop-down menu in the sidebar and click on the “Pages” tab.
Once opened, you’ll see a top-level summary page that includes a graph and current count indicating how many pages have been and haven’t been indexed.
What you’re looking for is a gradual increase in the number of indexed pages, relevant to how often you’re publishing new content. Drastic drops or spikes could indicate an issue that requires further investigation.
Eventually, you’ll hope to see the canonical version of each important page group on your site indexed. Pages that have been submitted for indexing will have one of the following statuses:
- Crawl: Crawl status means that Googlebot is in the process of crawling the page to gather information and determine if the page is worthy of being indexed.
- Indexing: Indexing status tells you that the page has been analyzed by Googlebot and stored in the index servers. This indicates that the page is eligible to rank in SERPs, but doesn’t necessarily mean that the page is currently ranking.
- Serving: Serving status indicates that a page has been indexed and is being served in Google search results.
Highlight the Error tab and scroll down to the Details section. You will see that the errors have been grouped into the following detailed views:
- Why pages aren’t indexed table: This table shows the various status codes that explain why URLs weren’t indexed. Click on each row to open a detailed view of URLs that are affected by this issue, as well as a history of this issue on your site.
- Improve page experience table: This table shows pages that were indexed, but Google is recommending some changes that will improve the search engine’s ability to understand the content.
- View data about indexed pages: Click this link for a list of pages that are indexed, as well as historical data about how many pages on your site have been indexed over time.
- We’ll be focusing on the “Why pages aren’t indexed table” for the purposes of identifying and fixing Search Console Indexing errors.
Using the URL Inspection Tool to Identify Indexing Errors
You can use the URL Inspection Tool to get a deeper understanding of how Google sees specific pages on your website. Use it anytime you want detailed information about a specific page’s current indexing status and any errors that are preventing the page from being indexed.
Here’s how to use the URL Inspection Tool Step-by-Step:
- Locate and select the URL Inspection Tool in the main GSC header.
- Enter the URL of the webpage you want to inspect and hit Enter.
- The tool will tell you if the page has been indexed, if the status is pending, or if it’s not indexed.
- If the page isn’t indexed, you’ll be provided with the reasons why. Use the list below to determine what common search indexing errors mean and what action to take next.
Now, let’s dive into what common search console errors mean, how to fix them, and which warnings may not require any action at all.
Server error (5xx)
This warning indicates that Googlebot encountered a server error when it attempted to crawl your page for indexing. If you can load the page in your browser now, there’s a good chance the server issue resolved itself. If not, you should check with your developer or hosting provider to determine what further action is needed.
Redirect error
This warning indicates that Googlebot encountered a redirect error when it attempted to crawl and index your page. This error occurs when the redirect chain is too long, the redirect loops back to the same page, the redirect URL eventually exceeded the maximum length, or there was a bad or empty URL in the redirect chain.
To fix this issue, you should investigate the redirect to identify and correct the error. Try to avoid redirect chains with multiple steps and ensure all pages leading to the final destination are loading correctly.
URL marked “noindex”
This URL has been marked with a “noindex” tag, meaning Google will not include the page in the search results. If you’re concerned, you need to ask yourself if you would like visitors to find this page through search. If not, there’s no further action required.
Submitted URL has a crawl issue
This warning indicates that the page was submitted for indexing, but Googlebot encountered a crawl issue when it attempted to crawl the page. Use the URL Inspection Tool to find out exactly what the issue is.
Many times, Google could not load the page due to issues with certain page elements, such as JavaScript, CSS, or certain images. Try visiting the page to see if it’s loading ok now. If it is, re-submit the page for indexing. If not, you’ll need to correct the issue before re-submitting.
Crawled – currently not indexed
The page was crawled and not indexed, but no specific reason was given. You should consider adding useful content and improving optimization to increase the chances of being indexed the next time the page is crawled. There’s no need to resubmit a crawl request.
Blocked by page removal tool
This page was blocked for indexing by someone on your team using a page removal tool. You should verify that the page was blocked intentionally. Note that removal requests only remain in effect for 90 days. After that, the page will likely be re-indexed unless you implement a proper “noindex” tag, redirect, or remove the page.
Discovered
The page has been discovered but not indexed. This typically means that Google intended to crawl the page but rescheduled the crawl for an unspecified reason. It will re-attempt to crawl the page at a later date.
If you’re noticing this error a lot and have a larger website (10,000+ pages), it could mean that the server was overloaded when Google attempted to crawl the page. Talk to your hosting provider to see if this was the case and what steps should be taken.
It could also mean that your site has exceeded its crawl budget. This can happen when your CMS is auto-generating content, or you have an excessive amount of user-generated content or filtered product category pages. Consider removing repetitive content or blocking unnecessary pages from being indexed.
Blocked by robots.txt
The page has been blocked from crawling by your site’s robots.txt file. Google may still index the page if it can find information about it without loading it. You should ensure that the page is intentionally blocked from indexing and implement a proper “noindex” directive to ensure that the page won’t be indexed in the future.
Blocked due to unauthorized request (401)
This is a common issue when the page requires authorization, such as a password for access. You should verify that authorization requirements have been implemented correctly and no further action is needed.
It’s also worth noting that this error can occur when a developer links to pages on a staging site while the site is under construction but forgets to update the links once the site goes live. To fix the issue, you’ll need to update the links.
Blocked due to access forbidden (403)
This error is similar to a 401. The page will not be indexed because Googlebot cannot provide the proper credentials. If you want this page to be indexed, you’ll need to allow access for non-signed-in users or explicitly allow Googlebot to load the page without authentication.
Crawl anomaly
There’s an unspecified anomaly that is preventing the page from being crawled and indexed. One of the most common causes is that the page no longer exists or the page is redirecting to a page that’s returning a 404 error. Ensure that there’s only one step in any redirect chains leading from this page and that the page you’re directing to is loading correctly.
Alternate page with proper canonical tag
This indicates that the page is duplicate content and is currently pointing to the correct canonical page. There’s nothing to do here unless you want to look for a way to consolidate both pages into one URL.
Duplicate without user-selected canonical
This warning indicates that there are duplicate pages, but none have been marked canonical. Google has chosen a different page and indexed it as canonical. If you think Google has marked the wrong URL, you should choose and mark the correct canonical page with a proper canonical tag.
Duplicate non-HTML page
Google has discovered a PDF or some other non-HTML resource on your website that is a duplicate of another page that has been marked canonical. These pages should not be indexed, so there’s no further action required here.
Duplicate, Google chose a different canonical than user
The URL for this page is marked as canonical, but Google thinks a different page would make a better canonical for this set of pages. This can happen when you specify one version of a page as canonical but then redirect to a different version. You should review your canonical tag for this set of pages and ensure that the correct one is indicated.
Page not found (404)
Google discovered a URL without any request to be crawled, but the page no longer exists, and no redirect has been implemented. If possible, you should implement a 301 redirect to an appropriate page. If no appropriate page exists, you can leave the 404 as is, but it’s best to avoid 404s whenever possible.
Page with redirect
This page wasn’t indexed because it has been redirected. As long as the page was redirected intentionally, there’s no further action required.
Queued for crawling
This page has been added to the crawling queue. You’ll need to check back later for updates.
Soft 404
This page wasn’t indexed because it no longer contains useful information. This often occurs when a user-friendly not found message has been added without the proper 404 HTTP response code. You should implement a 301 redirect to an appropriate page, repopulate the page with useful content, or convert it to a proper 404.
Submitted URL dropped
This indicates a URL that has been submitted for indexing but was dropped from the index without a specified reason. You should consider adding some fresh informational content and improving optimization to improve the chances of being re-indexed or implement a 301 redirect to an appropriate page.
How to Tell Google You’ve Fixed an Indexing Issue
Here’s how to tell Google you’ve fixed an issue and the page is ready to be re-crawled for indexing:
- Open the Page Indexing Report and click on the URL you’re ready to re-submit.
- Go through the list of page details to ensure that you’ve addressed all of the issues listed. When you’re satisfied, click on “Validate Fix.”
- Google will send you an email to indicate that the validation process has begun. This process can take several weeks. Once Google resolves the issues, there’s a good chance your page will finally be indexed and begin ranking in SERPs for relevant search queries.
Understanding what common search console warnings mean and how to address them is an essential first step in getting your page indexing issues resolved. It’s not hard to ensure that the right pages are indexed and that you get the results you want with just a little basic knowledge.
No comments:
Post a Comment
Comment