Resources

Myth: Optimizing the Crawled Version of Your Site is Gaming the System

October 5, 2023

Nostra's Crawler Optimization Imagery explaining how Nostra image.
Written by: 

In this blog

Book a Free Demo

Myth: Is Optimizing the Crawled Version of Your Site Gaming the System?

The origin of this myth is unknown because Google and other major bots that crawl your site provide developers with documentation on how to optimize your site for their crawler. It’s in their best interest for you to produce an easy-to-crawl version of your site that takes the least amount of crawler resources.


Here is a diagram that helps to illustrate the difference between Nostra’s customers’ experience with and without Nostra.

Cartoon imagery of Myth vs Fact Meter.

Without Nostra

Without Nostra, your site has not yet been rendered so it contains Javascript. According to Google, here is how Google bot processes pages with Javascript:

Google processes JavaScript web apps in three main phases:
1.) Crawling
2.) Rendering
3.) Indexing
Googlebot queues pages for both crawling and rendering. It is not immediately obvious when a page is waiting for crawling and when it is waiting for rendering.
When Googlebot fetches a URL from the crawling queue by making an HTTP request, it first checks if you allow crawling…Googlebot then parses the response for other URLs in the href attribute of HTML links and adds the URLs to the crawl queue.
Googlebot queues all pages for rendering, unless a robots meta tag or header tells Google not to index the page. The page may stay on this queue for a few seconds, but it can take longer than that. Once Google's resources allow, a headless Chromium renders the page and executes the JavaScript.
Googlebot parses the rendered HTML for links again and queues the URLs it finds for crawling. Google also uses the rendered HTML to index the page. [source]

To restate Google’s quoted documentation, looking at the diagram above, Google is telling us it goes from steps 1 to 2 to process the page and send any other URLs it finds back to the queue for more crawling (steps 3 & 4). If we didn’t have Javascript, we would be done and put into the index. Because the page has not been pre-rendered, it is sent to the Render Queue (step 5) to wait until Google’s Web Rendering Service (WRS) is available. Once the renderer is available, the page is rendered (step 6), and the results are sent back to processing a 2nd time (step 7), which then sends any discovered URLs back to the queue for more crawling (steps 9 and 10) and finally, the page is sent to the search index (step 8).


This is extra steps, extra processing, more crawler budget used, extra compute power utilized, and extra wait time for the additional queue. And Google leaves you with this caution in its developer documentation:

While Google Search executes JavaScript, there are JavaScript features with limitations in Google Search and some pages may encounter problems with content not showing up in the rendered HTML. Other search engines may choose to ignore JavaScript and won't see JavaScript-generated content. [source]

With Nostra

With Nostra, pages on your site are pre-rendered and cached. This means crawlers can process your pages and put them in the index without any additional rendering queues or processes. It also eliminates the double loop on the processing step. Google states in their developer documentation on crawling pages with and without Javascript:

Crawling a URL and parsing the HTML response works well for classical websites or server-side rendered pages where the HTML in the HTTP response contains all content. [source]

This is what the Nostra app is doing, pre-rendering your site server-side to convert it from this more complex site to crawl into a more “classical” site that is simpler and faster for bots to crawl.

Google also notes that not all bots have the same capabilities. We often talk about Google as if it’s the only search engine, but there are many other bots with significant geographical market share. Beyond Bing, Yahoo!, and DuckDuckGo, global companies know Yandex in Russia, Asia and Europe, Baidu in China, Naver in South Korea,  Yahoo! Japan in Japan, Yahoo! Taiwan in Taiwan, Seznam in Czech Republic, and Qwant, in Paris, France, are highly popular search engines. With all these different search engine crawlers with a range of capabilities and limitations, pre-rendering your site is the most reliable to know it can be crawled globally without issues, even by bots that don’t support Javascript.

Keep in mind that server-side or pre-rendering is still a great idea because it makes your website faster for users and crawlers, and not all bots can run JavaScript. [source]

Note: In the quote above, Google mentions pre-rendering is excellent for both users and crawlers, and that’s true, but unfortunately, pre-rendering for users is not an option at this time for stores running on e-commerce platforms such as Shopify or Salesforce Commerce Cloud. Much of the interactive functionality of the e-commerce app and the thousands of partner apps that can be installed relies on client-side Javascript executing with user interaction. Since bots don’t do any user interaction, the store can be pre-rendered for them. However, users would lose the interactivity and features that are powered by Javascript that are currently designed to be executed client-side at runtime.

These hybrid systems are called Dynamic Rendering, where server-side pre-rendered pages are served up to the bots for optimized crawling and client-side rendered pages are served up to users for full-featured interactivity. The name “Dynamic” Rendering is derived from the system’s ability to dynamically determine whether the request for a page is coming from a user or a bot and serve up the proper version of the page (pre-rendered for bots or client-side rendered for users).

While the diagram above illustrates Nostra’s product that optimizes the crawler version of the site, Nostra has a companion product (not shown in the diagram) that optimizes the page experience for users’ client-side rendered pages.

Benefits of Optimizing for Crawlers

  • Crawl Rate Optimization - Crawl more pages. Google limits your crawler budget of how many pages are crawled based on the allocation of resources required to crawl them. If Google can load and crawl your pages with less resources, it can index much more content from your site.
  • Improves your site's crawl efficiency.
"Both the time to respond to server requests, as well as the time needed to render pages, matters, including load and run time for embedded resources such as images and scripts.” -Google  [source]
  • Get a bigger crawl budget - Faster site response time to the crawler increases your total crawler budget.
  • New Products or Blog Pages show up in Search faster.
  • Rapidly changing content like campaigns, price changes, sales, news, or other frequently updated info on your site gets into the search index faster.
  • Avoid crawler slowdown due to e-commerce outages or server errors or network issues. Having a pre-rendered, cached version of your store for bots gives another layer of protection against having your crawler rate reduced.
“5xx and 429 server errors prompt Google's crawlers to temporarily slow down with crawling. Already indexed URLs are preserved in the index, but eventually dropped. Network and DNS errors have quick, negative effects on a URL's presence in Google Search. Googlebot treats network timeouts, connection reset, and DNS errors similarly to 5xx server errors. In case of network errors, crawling immediately starts slowing down, as a network error is a sign that the server may not be able to handle the serving load.” -Google [source]
  • Eliminates Javascript limitations and errors specific to bots that can prevent your site from being crawled properly.
  • It's the green solution. We estimate saving 40,500 tons of CO2 over 12 months by reducing bot processing for Nostra’s customer pages. Did you know 80% of site traffic is bots? Nostra pre-renders pages, stores them in the cache, serving a fraction of the compute power needed for each bot. This not only offloads traffic but also reduces CO2 emissions significantly.
  • Global Commerce - While Google is the world’s most used search engine, it doesn’t dominate the market in all countries. Beyond Bing, Yahoo!, and DuckDuckGo, global companies know Yandex in Russia, Asia and Europe, Baidu in China, Naver in South Korea,  Yahoo! Japan in Japan, Yahoo! Taiwan in Taiwan, Seznam in Czech Republic, and Qwant, in Paris, France, are highly popular search engines. With all these different search engine crawlers with a range of capabilities and limitations, Dynamic Rendering your site is the most reliable to know it can be crawled globally without issues, even by bots that don’t support javascript.

Related Blogs:

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Discover How Speed is Key to Conversions

Over a hundred DTC companies already trust Nostra to deliver blazing speed and converted customers