Adding a Sitemap with Gatsby

A sitemap serves as a navigational blueprint for search engines, ensuring they can efficiently crawl and index all essential pages of a website. By providing a structured list of URLs, a sitemap streamlines the discoverability of content, especially in complex or extensive sites. This not only optimizes search engine ranking and visibility but also ensures that any updates or new content additions are promptly recognized and indexed, thereby enhancing the site’s overall accessibility and user experience.

🚀 TL;DR See links below.

Install & Configure

The gatsby-plugin-sitemap plugin makes it simple to add a sitemap. We’ll build on the site created in a previous post. Add gatsby-plugin-sitemap as a dependency in package.json.

Add gatsby-plugin-sitemap to gatsby-config.js as a plugin:

plugins: [
  `gatsby-plugin-sitemap`
]

Build

Build the site then look in public/. You’ll find (at least) two new files:

  • sitemap-index.xml and
  • sitemap-0.xml.

If you have a large site then there may be more files named sitemap-1.xml, sitemap-2.xml, sitemap-3.xml etc. These files will contain the actual sitemap data, which is essentially just a list of all of the pages in the site. The sitemap-index.xml file acts as an index to these files.

🚀 TL;DR Show me the code. Look at the 3-sitemap-simple branch.

Tweaking

In many cases the default configuration will work perfectly. However, if you need to do something more niche then there are ample ways to tweak the way that this plugin works.

GraphQL

The default GraphQL query for the sitemap is:

{
  site {
    siteMetadata {
      siteUrl
    }
  }
  allSitePage {
    nodes {
      path
    }
  }
}

If the sitemap you get out of the box doesn’t quite meet your needs then you might need to tweak this query. You might also want to filter the pages included in the sitemap. Both of these can be achieved by tweaking gatsby-config.js.

GraphQL Query Returns Edges

Suppose that for some reason we wanted the GraphQL query to return edges rather than nodes.

{
  site {
    siteMetadata {
      siteUrl
    }
  }
  allSitePage {
    edges {
      node {
        path
      }
    }
  }
}

No problem, just specify query in the plugin configuration. However, the plugin still expects to receive a list of nodes, so we also need to use resolvePages to provide a mapping from edges to nodes.

plugins: [
  {
    resolve: "gatsby-plugin-sitemap",
    options: {
      output: "/",
      query: `
      {
        site {
          siteMetadata {
            siteUrl
          }
        }
        allSitePage {
          edges {
            node {
              path
            }
          }
        }
      }`,
      resolvePages: data => data.allSitePage.edges.map(edge => edge.node),
    },
  }
]

🚀 TL;DR Show me the code. Look at the 4-sitemap-graphql-edges branch.

Exclude Drafts

If some pages are marked as draft then we probably don’t want those to appear in the sitemap. Drafts can be flagged by adding a page-draft field to the AsciiDoc header. And that field can then be extracted via GraphQL.

{
  site {
    siteMetadata {
      siteUrl
    }
  }
  allAsciidoc {
    nodes {
      pageAttributes {
        draft
      }
      fields {
        slug
      }
    }
  }
}

This is what the GraphQL result looks like:

{
  "site": {
    "siteMetadata": {
      "siteUrl": "https://www.whimsyweb.dev"
    }
  },
  "allAsciidoc": {
    "nodes": [
      {
        "pageAttributes": {
          "draft": null
        },
        "fields": {
          "slug": "/what-is-asciidoc/"
        }
      },
      {
        "pageAttributes": {
          "draft": "true"
        },
        "fields": {
          "slug": "/what-is-gatsby/"
        }
      },
      {
        "pageAttributes": {
          "draft": null
        },
        "fields": {
          "slug": "/what-is-tailwind/"
        }
      }
    ]
  }
}

Update the plugin specification.

plugins: [
  {
    resolve: "gatsby-plugin-sitemap",
    options: {
      query: `
        {
          site {
            siteMetadata {
              siteUrl
            }
          }
          allAsciidoc {
            nodes {
              pageAttributes {
                draft
              }
              fields {
                slug
              }
            }
          }
        }`,
      resolvePages: data => sitemapQuery(data),
      serialize: ({ path }) => {
        return {
          url: path,
          changefreq: "monthly",
          priority: 0.5,
        };
      }
    }
  }
]

The sitemapQuery() function (invoked in the resolvePages field) filters out the draft posts. It also manually adds in an item for the site landing page that’s not included via the GraphQL query.

function sitemapQuery(data) {
  const posts = data.allAsciidoc.nodes.filter(
      // Exclude draft posts.
      node => node.pageAttributes.draft == null
    ).map(
      (node) => ({
        path: node.fields.slug
      })
    );

  // Add landing page manually since it's not included in the GraphQL results.
  const home = {
    path: '/'
  }
  return [...posts, home];
}

🚀 TL;DR Show me the code. Look at the 5-sitemap-filter-draft branch.

Conclusion

Adding a sitemap to your site is quite likely to improve its SEO performance. You certainly have nothing to lose! The gatsby-plugin-sitemap plugin can be used to add a quick and dirty sitemap that includes all pages on the site. Alternatively, there are a variety of options that make it possible to customise the sitemap to precisely your requirements.

The code for this post can be found here.