After your website is crawled, Google needs to sort through it and categorize it into its index. The index is where all the crawled data is stored. Think of it like a filing cabinet; all of your website’s information is neatly organized, and Google can easily retrieve it when it needs to. However, just because your website has been crawled does not mean that it has been indexed, or indexed properly. To see if your URL is indexed, you can use URL Inspection Tool. If your URL is not being indexed, you can request indexing using Fetch as Google.
Sometimes, pages are removed from the index. There are multiple reasons this could happen, but some of the most common include:
- URL returning a 4xx or 5xx error code (can be fixed with a 301 redirect)
- URL has a noindex meta tag to tell search engines not to index it
- URL violates Webmaster’s Guidelines and has been removed
- Crawlers blocked by password requirement
One of the best ways to tell Google how to crawl and index your URLs is through the use of meta Directives, or Meta Tags. These are lines of code that help Google determine what to keep indexed, where to go next, and how long to keep them archived. There are two types of Meta Directives: Robots Meta Tags and X-Robots-Tags.
Robots Meta Tags
Robots Meta tags are specific for each page of your website. The code goes in the <head> section of your page’s code, and can exclude all or specific search engines. Tags include:
- Index/Noindex: Tells search engines that this particular page should not be indexed. This is useful to trim search results down, so the user isn’t getting unnecessary pages like others’ user profiles. It’s important to remember that users might still be able to access these pages within your website, but they will not show up in search results.
- Follow/Nofollow: Tells Search engines whether links on the page should be followed to continue crawling and indexing. This might typically be used along with noindex.
- Noarchive: Restricts search engines from saving URLs in their index, usually used on e-commerce websites where prices change frequently.
X-Robots-Tags provide a more flexible functionality to your site than meta tags. X-Robots-Tags allow you to block search engines once, but it will apply to multiple pages rather than having to do it for each individual page. You can also block crawlers from looking at non-HTML files, like pictures, videos, and PDFs.
Why Would I Not Want All My Pages Indexed?
Good question! Keeping the index clean and small allows for search engines to find your site more easily. Think of it like your desk; if it’s cluttered and has stacks of paper all over it, it can be really difficult to find what you’re looking for. But if your desk is tidy and organized, you probably can find what you need right away. If Google is your desk, search results are what is out on your desk-top. You probably use them the most, and want them handy at any time. Anything else is still accessible, but might take opening a drawer to get into Meta Tags and X-Robot-Tags are to help you declutter what isn’t absolutely necessary.