There are a couple files which can have an impact on the SEO performance of a site: (1) a sitemap and (2) a robots.txt
. In a previous post we set up a sitemap which includes only the canonical pages on the site. In this post we’ll add a robots.txt
.
A Gatsby site will not have a robots.txt
file by default. There’s a handy package which makes it simple though. We’ll take a look at how to add it to the site and a couple of ways to configure it too.
🚀 TL;DR
Show me the code. Look at the 15-robots
branch. This site is deployed here.
What’s robots.txt?
The robots.txt
file is a guide for search engine crawlers. It indicates which parts of a site are allowed or forbidden from crawling and indexing. It can help to ensure that a site is indexed accurately and should prevent sensitive or duplicate content from being crawled and indexed.
Add Package
Add the gatsby-plugin-robots-txt
package to packages.json
. Then reinstall site packages with npm
or yarn
.
npm install
Configure Package
The simplest configuration would involve just adding gatsby-plugin-robots-txt
to the list of plugins in gatsby-config.js
.
You can also add some details to the plugin configuration:
{
resolve: 'gatsby-plugin-robots-txt',
options: {
policy: [
{
userAgent: 'Googlebot',
allow: '/',
crawlDelay: 5
},
{
userAgent: 'bingbot',
allow: '/'
},
{
userAgent: 'CCBot',
disallow: '/'
},
{
userAgent: '*',
allow: '/'
}
]
}
}
Here we’ve set up different policies for three specific bots (Google, Bing and Common Crawl) and another policy for all other bots. More examples of the information that can be included in the policy can be found here.
There are a couple of other things that can be configured here:
host
— the site URL (generally obtained fromsiteUrl
); andsitemap
— the location of a sitemap (the default will work immediately for sitemaps generated viagatsby-plugin-sitemap
).
You can also specify different configurations depending on the value of the GATSBY_ACTIVE_ENV
environment variable. This can be useful, for example, if you want to have different content in robots.txt
depending on whether it’s a production, development or preview build.
This is what the resulting robots.txt
looks like:
User-agent: Googlebot
Allow: /
Crawl-delay: 5
User-agent: bingbot
Allow: /
User-agent: CCBot
Disallow: /
User-agent: *
Allow: /
Sitemap: https://gatsby-whimsyweb-15-robots.netlify.app/sitemap-index.xml
Host: https://gatsby-whimsyweb-15-robots.netlify.app
You can see the live version of this file here.
Conclusion
Adding a robots.txt
file to a Gatsby site is quick and easy. It is likely to improve the site’s SEO performance (perhaps only slightly?). And it can certainly do no harm (provided that you don’t preclude an important crawler bot!).
🚀 TL;DR
Show me the code. Look at the 15-robots
branch. This site is deployed here.