We’ll attempt to answer all these questions and more here today. We’ll go over what you need to know about React and SEO and what you can do to create an SEO-friendly React application.
Let’s dive straight in, but after we’ve covered a few basics.
What is React?
Although Facebook originally developed the framework, it is now maintained and supported by an active open-source community.
Having used jQuery to create websites for years, I’ve always felt unproductive. That is until my first project with React. Since then, I haven’t touched jQuery.
- Virtual DOM
React uses a virtual document object model (aka virtual DOM) to minimise calls to the real DOM and reduce page-load times. That boosts performance significantly because each time a change occurs in the UI, it creates a new list of UI instructions and a diff between the new and previous DOM trees. It then uses that diff to generate the minimal number of updates needed for applying those changes to the real DOM.
That is what makes React to be this fast. It only modifies the real DOM if necessary, unlike jQuery, which applies any CSS or styling changes to the actual DOM, even where it’s unnecessary.
- Components over Templates
- One-Way Data Flow
React implements one-way data flow, where parent components pass data to their children via props. It is the opposite of how Angular handles things, and it’s one of the reasons React apps are so fast.
Instead of two-way binding, where a change in the child component can trigger changes in the parent’s state, props allow you to create loosely coupled components that ensure that only the parent should handle things such as data fetching and processing, etc. That makes your UI more predictable and easier to maintain.
Basic Principles of SEO
Before we dive into the SEO strategies and best practices for React, let’s first go over a few basic principles of search engine optimisation.
SEO is the practice of improving your website’s visibility in search engines. Its practitioners purposely aim to rank higher in SERPs so their websites can have more organic traffic.
Also, according to Backlinko, the top three results in the SERPs hog about 75% of user clicks. It only gets worse: results from the second page and beyond get about 0.78% of the clicks.
That explains why digital businesses are furiously working around the clock to appear on the first page.
Web Development and SEO: How the Two Connect?
Don’t think of SEO as a standalone feature that you can tack onto your website. A common mistake to avoid is to think of SEO as a secondary task that comes after development.
Not really. SEO should be a foundational aspect of your website’s development. The reason for that is search engine crawlers rely on specific technical aspects to identify and index websites correctly.
Google uses its spiders to crawl the web, fetch data from HTML documents and store that information into its database for indexing purposes. In other words, search engine crawlers read your web pages just like an average user would.
That means that how you write and structure your site’s code and content impacts how search engine crawlers see your website.
Every search engine has its own crawlers, each of which comes with its own set of rules and guidelines for indexing content.
Google, the most popular search engine out there, uses 200-plus ranking factors to determine a website’s relevance and worthiness to users.
How Search Engine Crawlers Work
Google bots go around the web, exploring web pages link by link. With each page, they’ll try to gather information on the page’s freshness, number (and quality) of backlinks, content uniqueness, etc.
They’ll then download the information, together with its HMTL and CSS files, and send it to Google server, where they’ll be analysed and indexed by Google’s indexing system called Caffeine.
Note that this is a fully automated process. So, you want to make sure your website is structured correctly for search engine crawlers to understand its content.
That’s where the problem sets in.
What’s Wrong with Optimising Single Page Applications for Search Engines?
Most websites use the single-page application (SPA) approach. It’s where web pages are rendered on the client-side. If you ever used Twitter, you must have noticed that its user interface looks slightly different from your average blog or news website.
Speaking of which, a simple HTML document for React app looks something like this:
As you can see, this page only contains an external script and a <div> tag, nothing else.
Meaning the browser has to read the script, and it’s only after the browser reads the script that the content will dynamically be loaded onto the webpage.
So, when search engine bots land on the page, they’re met with an empty page and not the actual content.
So, they end up not indexing, at least not appropriately.
Google has been working on how to fix this problem.
While that came as great news, a few problems persist:
That’s why Google has publicly said that it might take up to weeks or even months before bots can crawl and index dynamically loaded pages accordingly, as reported by Google Chrome developer Paul Kinlan.
As you can expect, this is a major issue for many companies that rely on dynamic content to power their sites.
This might take weeks or even months, and it’s only after it’s done deciphering this data that it can send it over to Google’s servers for indexing.
Limited Crawling Budget
The crawl budget is like a pie of resources that search engine crawler has to divide among the URLs it finds during its crawling activity.
So, if there are hundreds or thousands of pages on your website, then each page gets only a fraction of the crawl budget.
It can be defined as the maximum number of pages that a search engine crawler can process in a certain period.
Once this time is up, the crawlers will leave your page, whether or not it’s downloaded all the pages.
If a web page takes so much time to load because of running a script, the bot might leave the page before indexing it.
So, if you’re planning to create your website using React, getting it ranked on these search engines is the next thing to impossible.
The best way to solve this problem is to work on it right on the stage where you’re designing your app architecture.
Let’s see how you can do it.
How to Solve these Problems
Step 1: Isomorphic Rendering
It allows you to provide the same content regardless of whether it’s accessed by a browser or search engine crawler.
Many React frameworks offer this approach, like Next.js, based on Express. It was developed by ZEIT (formerly Zeit Inc.) to power its documentation website with React.
How it Works
On the client’s side, the app will use HTML as a base. It will continue to operate on it as if the browser rendered it.
How to Build an Isomorphic App
Isomorphic apps aren’t easy to build. They can be time-consuming. Luckily for you, some frameworks speed up the whole process, taking the hassle off everything.
Two popular examples of these frameworks are Next.js and Gatsby.
It allows you to automate hot code reloading and code splitting. It can also do a full-fledged server-side rendering, generating HTML for each request right at the time when the request is made.
Gatsby, on the other hand, operates as an open-source compiler. It let’s you build fast and powerful websites, but it doesn’t offer full-fledged server-side rendering.
Instead, it tries to generate a static website beforehand, storing all the generated HTML files in the cloud or the website’s server.
Now, let’s try and compare the two.
Next.js Vs. Gatsby
Gatsby solves SEO challenges by generating static HTML web pages. These pages are generated in advance, during web development, and simply loaded to the client’s web browser when they make the request.
The generated HTML content can then be hosted in the cloud or your hosting service. Such websites tend to be super-fast since they aren’t generated upon request. Nor do they need the browser or search engine bot to pull data from the database or via an API.
Instead, the data is fetched during the development phase. Meaning, if your website or page has any new content, it will not be displayed until you run another build.
When to Use Gatsby
Gatsby makes an excellent choice if your up doesn’t update data frequently. For example, you cannot use it for a blog or forum.
You cannot use it on a site that loads hundreds of posts or comments, such as social networks.
Next.js uses the server-side rendering approach. Unlike the traditional server-side rendering, the framework generates HTML/CSS content from the server before loading them on the browser.
The HTML content is generated on the spot each time the user sends a request.
When to Use Next.js
Next.js works great when the app contains dynamic data, such as a social media network, blog, or forum.
For server-side rendering to work on a React app, the developer has to use a node.js server, processes requests at runtime.
Server-side Rendering (SSR) with Next.js
Let’s go through the next.js rendering algorithm to see how it looks and operates:
- When a next.js server (which runs on node.js) receives a request, it matches it with a specific webpage (one of the components of React) using the page’s URL.
- The page can request data from the database or API, and the server will be waiting, ready to load it up.
- The next.js app will generate HTML and CSS based on the data it has received from the existing React components.
How to Make a Website SEO Friendly Using Gatsby
The process of optimising a React app for search engines occurs in two phases:
- Generating a Static HTML Webpage During the Development Stage
- Processing Requests During Runtime
Generating a Static HTML Webpage During the Development Stage
Let’s take a deep look into the building process.
First, this is how the process looks like:
- The Gatsby bundling tool receives the page’s data from an API, file system, and CMS.
- During deployment or setting up a CI/CD pipeline, the bundling tool generates HTML/CSS based on React components and the data it receives.
- After compilation, the tool creates a folder (about folder). This folder will contain an index.html file.
The website will only contain static files that you can store in the cloud or host with any hosting service.
Processing Requests During Runtime
Creating isomorphic apps is considered the most reliable way to make a react app iso-friendly. But it isn’t the only way to go about it.
Prerendering reloads all your HTML elements on a page, caching SPA pages on the webserver using headless Chrome.
One way to go about it is to use a prerendering service such as prerender.io. A prerendering service by intercepting the requests to your website, and using a user agent, defines if it’s a bot or user viewing your website.
That complies with SEO best practices because crawlers can see your website as you originally designed it, but users can still interact with your React app like they would in an installed app.
Prerendering has a much lighter server payload than SSR.
- Since prerendering is about making your website available to crawlers, you only need the ability to generate HTML snippets with React components.
- It can be scalable to single or multiple servers.
- Since it’s only generating HTML snippets, you should consider prerendering as a low-budget option compared to SSR with Next.js.
Prerendering services aren’t free. Most of them are paid. They also tend to perform poorly with dynamically changing content.
How to Implement Prerendering Using Prerender.io
Prerender.io scrapes your website’s pages regularly using Chrome. It then stores all the rendered HTML pages into its databases, giving you an API that you can use to access the generated HTML content for every page on your website.
The next thing you’ll be required to do is add a proxy that will check the user agent to determine if it’s an actual user or bot.
If the proxy identifies the user agent as a search engine bot or crawler (such as Facebook or LinkedIn), it will respond by sending an API call to get the prerendered HTML files from prerender.io and send them back to the crawler.
It’s that simple.
But what makes it even better is that you do not have to write the proxy. Prerender has configurations for almost all common web servers, including Apache, Nginx, Express, HaProxy, etc.
The Actual Setup
You can integrate a prerender in two ways:
- You can integrate it into your backend using a prerender “middleware.” This option is analogue to node.js middleware.
Middlewares are the small code snippets that get executed with every request. However, they only affect a response when they detect a bot (such as Google bot).
In which case, it will fetch the prerendered code and respond with that.
Here is an example of middleware.
- The second option is to integrate them into the CDN. Meaning, it’s not integrated as a middleware but as a set of well-crafted rules between the website’s backend and the prerenders cloud service.
Here’s an example of a CDN prerender.
The rewrite target will be a simple URL concatenation in both options (https://service.prerender.io/<YOUR_URL>). This concatenation is super-easy to test, and the best part is that it doesn’t necessarily need any form of integration.
Just issue the curl command below, and you’re good to go:
curl -H “X-Prerender-Token: <YOUR_PRERENDER_TOKEN>” https://service.prerender.io/https://www.example.com/
You also have to note that each of these two options comes with its own sets of pros and cons:
- Flexibility: that’s because it will be running the actual code, which gives you more granular control over what’s to happen and at what time exactly.
- Caches and CDNs in front of your web server may interfere with your response and even load unwanted results.
You can mitigate this risk by integrating it into your CDN. The problem is that not all CDNs are built to route that way or the way you want them to.
The work of the prerender middleware you install in your server is to analyse every request coming in to see if it’s from a crawler or bot. If it’s from a crawler, the middleware responds by sending a call to prerender.io to request the prerendered HTML.
If not, it will continue with the regular server route. Crawlers can’t really tell if you’re using prerender, and that’s because the request always has to go through the server.
How to Test Your Middleware?
You can test a prerendered page to see how crawlers view it.
You can start by setting up a user agent in your web browser to Googlebot and load your website. Alternatively, run this command line, replacing www.example.com with your website’s URL.
- curl -A Googlebot https://www.example.com/
If you can’t see a rendered version of your page, then it’s highly likely the middleware wasn’t set correctly.
Testing the Site Middleware with a Local Development Server
You can publish your core components as an open-source project on a local development server.
- git clone https://github.com/prerender/prerender.git
- cd prerender
- npm install
- node server.js
- Set the default port to 3000. Also, if needed, set the export port to 1337.
You have to make sure you have a prerender server running. Once confirmed, you can go ahead and prerender your pages using the command below:
Replace www.example.com with your URL (you can also use your locally hosted URL).
Note that opening the website’s URL may give you misleading results, especially since the URL may fail to load all the necessary resources (especially CSS).
When to Prerender Your Web Project
You don’t have to prerender every single one of your web projects. Prerendering is only recommended if your website is a Single Page App (SPA) and you want it to be available for bots to crawl.
If your content is behind a login screen, then prerendering won’t help you, considering search engine bots cannot dig past the login page.
Note that prerendering is available on both paid and open development plans.
7 Best SEO Practises for React
These SEO practices will guide you on how to make your react website more SEO-friendly:
React Router Usage
React follows the single-page application approach. However, the best way to utilize the single-page application model is to describe relevant SEO rules and elements more appropriately on your page.
According to Google, their crawlers can’t read URLs with a hash sign (#). In other words, they cannot index any of the React-generated web pages.
For this, you want to create URLs that open on separate pages. So, be sure React Router hooks in your URLs.
Here’s an example:
Also, while developing your web content, you do not want to run any of the processes with setTimeout. That will only make Googlebots leave your website when it finds it hard to read your content.
Googlebots are case sensitive.
When Googlebots look at a page and see a URL in both lower and upper case, they take it to mean they are two separate pages.
These are two separate pages in Google’s eyes, and not one.
If both pages lead to the same content, Google will interpret it as content duplication.
The only way to avoid making this duplication error is to make it a habit to be writing all your URLs in lower case, unless otherwise.
Using Href in Links
Don’t just publish links. Instead, consider giving them an “href.”
That’s because Googlebots aren’t designed to read the links you provide with onclick.
So, you want to make sure you’ve defined every link coming in with a Href. This will make it easier for Googlebots to identify relevant pages on your website and even visit them.
One crucial SEO element you cannot afford to ignore is metadata.
So, it’s only logical that it pops up in your source code, even when using React. Copy-pasting the same title and meta description might not help you that much in your SEO.
The best way to go about it is to create a suitable meta description and title for each page.
In comes React Helmet.
Here’s a sample code structure together with its metadata:
If the descriptive element fails to function, limit the description to 160 characters. Slice the content to 160 characters, as shown below.
Be sure to keep the structured data items in your source code alongside the metadata.
404 Code Error
Make sure all defective pages on your website run the 404-code error.
That’s a gentle reminder to set up files like route.js and server.js.
Specify on-page images with “img src.”
According to a study conducted by seroundtable, not specifying your on-page images will make it difficult for Googlebots to index them.
Here’s how you specify images:
Using a CSS background with React will also hamper your efforts to index these images.
React lazyload is an excellent way to make your website load faster. It can also create a positive impact on your page’s speed score.
The lazy load package can be found on npm.
You also want to take full advantage of React Snap to boost your website’s speed.
It also doesn’t post inapplicable or unwanted codes. This helps to increase page speed.
In simpler terms, if you had, say, 2MB of JS files, you can split them into 60 or 70kbs files and run them in separate processes.
Google crawls and indexes HTML pages better, without incurring any hardship, unless otherwise. The problem kicks in when it’s single page application with little to no html content and a JS script tag.
You have to understand the challenges you’re presented with and then figure out how to overcome the tactics covered in this article.