What Is: Googlebot

Googlebot is Google’s personal web crawling spider that navigates the web and discovers new, fresh, engaging content to store in its index, otherwise known as Google’s search results.

If you’re aware of the fundamental indexing process at Google, you’ll know that when you publish some new content, it will take Google a couple of days to pick up on this content. In essence, your website is effectively waiting for Googlebot to come and crawl your awesome piece of content to then include in the search results. Googlebot will read all of the data on a web page and then send it directly to Google for inclusion.

Googlebot will visit your newly-formed page as soon as it’s notified, this may be done via a manual index request, or maybe because another page linked to it – their spider tries to find as much content as it can on the web, and it does this by navigating from link to link, whether that be internal or external. It follows links, fetches the pages and then follows all of the links on those pages and so on, it’s a never-ending process.

How Google Bot Works

On arrival, Googlebot will look at the following elements on your site:

  • Your on-page SEO like title tags, headings, paragraphs, images and more to obtain an understanding of the purpose of your content;
  • Your website’s navigational structure and the links contained within your hierarchy;
  • Checks the code and elements that make up your page;
  • Links on your website to external sources;
  • The physical content on your pages that’s readable by both crawlers and humans;
  • Lots of other resources that are visible to Googlebot.

On the other hand, Googlebot will also be responsible for identifying black-hat activities and what may or may not be against the Google guidelines. I have listed some of these activities below:

  • Unrelated keywords and intentional stuffing of queries in content;
  • Text that’s very small, hidden text and links in content;
  • A lot of content that appears on another website in the same form, duplicate content.
  • Lots of comment spam on the page;
  • Over-optimisation for certain keywords (to many uses of the same keyword);
  • Lots of other things that may be against the guidelines.

In order to determine whether or not Googlebot is picking up on your existing content, you can check your pages index status using webmaster tools, or you can execute a site search command in the search results:

Site Search Command In Google

All of the pages in this screenshot (and more) are accessible by Googlebot and are being crawled and indexed. If pages aren’t appearing, once you have tried this command, it may be worth looking into whether or not you have anything on your site that may be restricting Googlebot’s access to your site, e.g a noindex tag in the header.

