Astro Sitemaps: Add Post and Page XML Sitemaps

Rodney Lab - Dec 6 '22 - - Dev Community

šŸš€ Astro XML Sitemaps

In this post we see a couple of ways to set up Astro Sitemaps. First we use the Astro sitemap integration. Then we see how you get more fine-grained control by creating your own sitemaps on resource routes. On top we see adding these and even custom styling are not at all difficult. Before we get into that though we take a look at why XML sitemaps are important and also which fields are needed in 2022. If that sounds like what you came here for, then letā€™s crack on!

šŸ¤·šŸ½ Astro Sitemaps: why add an XML Sitemap?

XML sitemaps are great for Search Engine Optimisation (SEO) as they provide an easy way for search engines to determine what content is on your site and when it was late updated. This last part is important as it can save the search engine crawling a site which has not been updated since the last crawl.

Crawling is the process by which search engines discover sites and also attempt to ascertain what a particular page is about. When crawling the search engine bot looks for anchor tags and uses its existing data about the site linked to as well as the text between the opening and closing anchor tag. These are used to work out what your site and the linked site are all about. Anyway, the crawl is all about finding links and updating the search engineā€™s index. Sites get allocated a budget which will vary based on a number of factors. This caps the time a search engine will spend indexing your site. You risk the search engine recrawling existing content without discovering and indexing your new pages if you donā€™t have an up-to-date sitemap.

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="https://example.com/sitemap.xsl"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/best-medium-format-camera-for-starting-out/</loc>
    <lastmod>2022-11-12T09:17:52.000Z</lastmod>
  </url>
  <url>
    <loc>https://example.com/folding-camera/</loc>
    <lastmod>2022-11-13T09:17:52.000Z</lastmod>
  </url>
  <url>
    <loc>https://example.com/twin-lens-reflex-camera/</loc>
    <lastmod>2022-11-14T09:17:52.000Z</lastmod>
  </url>
</urlset>
Enter fullscreen mode Exit fullscreen mode

In the example sitemap content above, we see three entires. Each has loc tag which contains the url of a site page as well as lastmod, the last modification date for the page. Hence by scanning this sitemap the search engine will be able to work out which content you updated since the last index. That saves it re-crawling unchanged content. On a larger site, bots might discover fresh content quicker.

šŸ¤” Which fields do you need inĀ 2022?

On older sitemaps you might see priority and changefreq tags. Although search engines used these in the past, they no longer matter much to Google. For that reason we will skip those tags in the rest of this post.

šŸ§± How to add an XML Astro Sitemap with the Integration

Astro integrations let you quickly add certain features to your site and typically need little or no configuration. Here we see how to setup up the Astro sitemap integration. If you already know you need something more sophisticated, skip on to the next section.

  • Like other integrations, the astro add command helps you get going quickly on the sitemap integration:
pnpm astro add sitemap
Enter fullscreen mode Exit fullscreen mode

When prompted type Y to accept installing the integration and also placing necessary config in your astro.config.mjs file.

  • Update the site field in your astro.config.mjs to match your siteā€™s domain:
import sitemap from '@astrojs/sitemap';
import svelte from '@astrojs/svelte';
import { defineConfig } from 'astro/config';

// https://astro.build/config
export default defineConfig({
  site: 'https://your-site-domain.com',
  integrations: [sitemap(), svelte()]
});
Enter fullscreen mode Exit fullscreen mode
  • As a final step, you can update the HTTP headers for the final Astro sitemap (more on this later). The method will depend on whether you are building your site an SSGStatic Site Generated site (Astro default) orĀ SSRServer-Side Rendered.

To see the sitemaps you need to build your site normally (pnpm run build), then run the preview server (pnpm run preview). You can see the sitemaps at http://localhost:3001/sitemap-index.xml and http://localhost:3001/sitemap-0.xml. The first is an index which will just links to the second. This second one includes dynamic pages like posts (depending on your site structure). You can also inspect these two files in the project dist folder.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
    <url>
        <loc>https://example.com/</loc>
    </url>
    <url>
        <loc>https://example.com/best-medium-format-camera-for-starting-out/</loc>
    </url>
    <url>
        <loc>https://example.com/contact/</loc>
    </url>
    <url>
        <loc>https://example.com/folding-camera/</loc>
    </url>
    <url>
        <loc>https://example.com/twin-lens-reflex-camera/</loc>
    </url>
</urlset>
Enter fullscreen mode Exit fullscreen mode

You will notice there are no dates by default. It is pretty easy to add a lastmod date parameter, though this will be the same date for all pages. If you want something more sophisticated, you can supply functions in the configuration, which generate the data you want. My opinion is that for this use case, it makes more sense to add a resource route with your own custom XML sitemap. This should keep the sitemaps more maintainable. We see this in the next section.

šŸ§‘šŸ½ā€šŸ³ Rolling your own Astro Sitemap for Increased Control

We will add three sitemaps, though for you own site you might decide to go for more or even fewer. All the sitemaps will include lastmod tags to help search engines optimise indexing. The first sitemap will be an index with links to pages and posts sitemaps. The index one is easiest with least dynamic data, so letā€™s start there.

Astro JS Index Sitemap

Astro does not just create fast HTML pages; you can also use it to create resource routes. We see how to serve JSON data and even PDFs in the Astro Resource route post, so take a look there for further background if you are intereseted. Letā€™s start by creating a src/pages/sitemap_index.xml.js file. The will generate the content served when a search engine visits https://example.com/sitemap_index.xml.

import website from '~config/website';

const { siteUrl } = website;

export async function get({ request }) {
    const { url } = request;
    const { hostname, port, protocol } = new URL(url);
    const baseUrl = import.meta.env.PROD ? siteUrl : `${protocol}//${hostname}:${port}`;

    const postModules = await import.meta.glob('../content/posts/**/index.md');
    const posts = await Promise.all(Object.keys(postModules).map((path) => postModules[path]()));
    const lastPostUpdate = posts.reduce((accumulator, { frontmatter: { lastUpdated } }) => {
        const lastPostUpdatedValue = Date.parse(lastUpdated);
        return lastPostUpdatedValue > accumulator ? lastPostUpdatedValue : accumulator;
    }, 0);

    const lastPostUpdateDate = new Date(lastPostUpdate).toISOString();

    const xmlString = `
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="${baseUrl}/sitemap.xsl"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <loc>${baseUrl}/page-sitemap.xml</loc>
        <lastmod>${lastPostUpdateDate}</lastmod>
    </sitemap>
    <sitemap>
        <loc>${baseUrl}/post-sitemap.xml</loc>
        <lastmod>${lastPostUpdateDate}</lastmod>
    </sitemap>
</sitemapindex>`.trim();

    return { body: xmlString };
}
Enter fullscreen mode Exit fullscreen mode

This code is based on the Astro Blog Markdown starter and we will get information on post modification dates from the Markdown frontmatter for blog posts. The two entries (pages and posts) have the same last modified date because the home page includes a list of recent posts so we assume the content there gets updated each time a blog post is updated. Because we are adding the logic ourselves, you can modify this better to suite your own use case if this is a poor assumption.

We add our sitemap to a get function, since this is the method the search engine will use to access it. We are assuming static generation here. If your site runs in SSR mode, consider adding an HTTP content-type header (see Astro resource routes post for details).

In line 8 above, the baseUrl will vary depending on whether we are running the site locally in dev mode or in production mode. We use Astro APIs to get the last date a post was updated, pulling data from post metadata. The full code is on the Rodney Lab GitHub repo, see link further down the page. Most important for you own project is the code in lines 20-30 with the XML markup. You can even add images and videos in here if you want to.

Styling

We also included an XSL stylesheet in line 20 just to make the site look a bit nicer for you while debugging! Create a public/sitemap.xsl file with this content:

<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2008, Alexander Makarov
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of sitemap nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->
    <xsl:stylesheet version="2.0"
        xmlns:html="http://www.w3.org/TR/REC-html40"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
        xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <html xmlns="http://www.w3.org/1999/xhtml">
        <head>
            <title>XML Sitemap</title>
            <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
            <style type="text/css">
                body {
                    font-family: Helvetica, Arial, sans-serif;
                    font-size: 13px;
                    color: #545353;
                }
                table {
                    border: none;
                    border-collapse: collapse;
                }
                #sitemap tr:nth-child(odd) td {
                    background-color: #eee !important;
                }
                #sitemap tbody tr:hover td {
                    background-color: #ccc;
                }
                #sitemap tbody tr:hover td, #sitemap tbody tr:hover td a {
                    color: #000;
                }
                #content {
                    margin: 0 auto;
                    width: 1000px;
                }
                .expl {
                    margin: 18px 3px;
                    line-height: 1.2em;
                }
                .expl a {
                    color: #da3114;
                    font-weight: 600;
                }
                .expl a:visited {
                    color: #da3114;
                }
                a {
                    color: #000;
                    text-decoration: none;
                }
                a:visited {
                    color: #777;
                }
                a:hover {
                    text-decoration: underline;
                }
                td {
                    font-size:11px;
                }
                th {
                    text-align:left;
                    padding-right:30px;
                    font-size:11px;
                }
                thead th {
                    border-bottom: 1px solid #000;
                }
            </style>
        </head>
        <body>
        <div id="content">
            <h1>XML Sitemap</h1>
            <p class="expl">
                This is an XML Sitemap, meant for consumption by search engines.<br/>
                You can find more information about XML sitemaps on <a href="http://sitemaps.org" target="_blank" rel="noopener noreferrer">sitemaps.org</a>.
            </p>
            <hr/>
            <xsl:if test="count(sitemap:sitemapindex/sitemap:sitemap) &gt; 0">
                <p class="expl">
                    This XML Sitemap Index file contains <xsl:value-of select="count(sitemap:sitemapindex/sitemap:sitemap)"/> sitemaps.
                </p>
                <table id="sitemap" cellpadding="3">
                    <thead>
                    <tr>
                        <th width="75%">Sitemap</th>
                        <th width="25%">Last Modified</th>
                    </tr>
                    </thead>
                    <tbody>
                    <xsl:for-each select="sitemap:sitemapindex/sitemap:sitemap">
                        <xsl:variable name="sitemapURL">
                            <xsl:value-of select="sitemap:loc"/>
                        </xsl:variable>
                        <tr>
                            <td>
                                <a href="{$sitemapURL}"><xsl:value-of select="sitemap:loc"/></a>
                            </td>
                            <td>
                                <xsl:value-of select="concat(substring(sitemap:lastmod,0,11),concat(' ', substring(sitemap:lastmod,12,5)),concat('', substring(sitemap:lastmod,20,6)))"/>
                            </td>
                        </tr>
                    </xsl:for-each>
                    </tbody>
                </table>
            </xsl:if>
            <xsl:if test="count(sitemap:sitemapindex/sitemap:sitemap) &lt; 1">
                <p class="expl">
                    This XML Sitemap contains <xsl:value-of select="count(sitemap:urlset/sitemap:url)"/> URLs.
                </p>
                <table id="sitemap" cellpadding="3">
                    <thead>
                    <tr>
                        <th width="80%">URL</th>
                        <th width="5%">Images</th>
                        <th title="Last Modification Time" width="15%">Last Mod.</th>
                    </tr>
                    </thead>
                    <tbody>
                    <xsl:variable name="lower" select="'abcdefghijklmnopqrstuvwxyz'"/>
                    <xsl:variable name="upper" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
                    <xsl:for-each select="sitemap:urlset/sitemap:url">
                        <tr>
                            <td>
                                <xsl:variable name="itemURL">
                                    <xsl:value-of select="sitemap:loc"/>
                                </xsl:variable>
                                <a href="{$itemURL}">
                                    <xsl:value-of select="sitemap:loc"/>
                                </a>
                            </td>
                            <td>
                                <xsl:value-of select="count(image:image)"/>
                            </td>
                            <td>
                                <xsl:value-of select="concat(substring(sitemap:lastmod,0,11),concat(' ', substring(sitemap:lastmod,12,5)),concat('', substring(sitemap:lastmod,20,6)))"/>
                            </td>
                        </tr>
                    </xsl:for-each>
                    </tbody>
                </table>
            </xsl:if>
        </div>
        </body>
        </html>
    </xsl:template>
    </xsl:stylesheet>
Enter fullscreen mode Exit fullscreen mode

This is based on code in an Alexander Makarov GitHub repo. Try opening the Sitemap in your browser. It should look something like this:

Astro Sitemaps: Styled X M L sitemap shows links to the page and post sitemaps with last modified dates.

Astro Sitemap: Page XML Route

Next up, here is the page code. The update dates are a bit more manual here. You have to remember to update them manually when you update content. An alternative is using the file modified date, though this can be complicated when using continuous integration to deploy your site. Note that the index sitemap links to this one.

import website from '~config/website';

const { siteUrl } = website;

export async function get({ request }) {
    const { url } = request;
    const { hostname, port, protocol } = new URL(url);
    const baseUrl = import.meta.env.PROD ? siteUrl : `${protocol}//${hostname}:${port}`;

    const postModules = await import.meta.glob('../content/posts/**/index.md');
    const posts = await Promise.all(Object.keys(postModules).map((path) => postModules[path]()));
    const lastPostUpdate = posts.reduce((accumulator, { frontmatter: { lastUpdated } }) => {
        const lastPostUpdatedValue = Date.parse(lastUpdated);
        return lastPostUpdatedValue > accumulator ? lastPostUpdatedValue : accumulator;
    }, 0);

    const lastPostUpdateDate = new Date(lastPostUpdate).toISOString();

    const pages = [
        { path: '', lastModified: lastPostUpdateDate },
        { path: '/contact/', lastModified: '2022-09-28T08:36:57.000Z' },
    ];

    const xmlString = `
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="${baseUrl}/sitemap.xsl"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${pages.map(
    ({ path, lastModified }) => `
<url>
  <loc>${baseUrl}${path}</loc>
  <lastmod>${lastModified}</lastmod>
</url>
`,
)}
</urlset>`.trim();

    return { body: xmlString };
}
Enter fullscreen mode Exit fullscreen mode

We repeat the logic to get the last post update date. In your own project you will probably want to move that code to a utility function if you also need it twice. In lines 19-22 we have a manually compiled list of site pages (excluding dymanic post routes which we add to their own sitemap in the next section). For each page we include the path and lastModified date. Then we use this array to generate the output XML.

Astro Sitemap: Post XML Route

Finally our posts will have dynamic dates, using logic similar to what we saw earlier, to get last modified fields from post Markdown frontmatter. Here is the src/pages/post-stemap.xml code:

import website from '~config/website';

const { siteUrl } = website;

export async function get({ request }) {
    const { url } = request;
    const { hostname, port, protocol } = new URL(url);

    const baseUrl = import.meta.env.PROD ? siteUrl : `${protocol}//${hostname}:${port}`;
    const postModules = await import.meta.glob('../content/posts/**/index.md');
    const posts = await Promise.all(Object.keys(postModules).map((path) => postModules[path]()));
    const postsXmlString = posts.map(({ file, frontmatter: { lastUpdated } }) => {
        const slug = file.split('/').at(-2);
        return `
<url>
  <loc>${baseUrl}/${slug}/</loc>
  <lastmod>${new Date(lastUpdated).toISOString()}</lastmod>
</url>`;
    });

    const xmlString = `
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="${baseUrl}/sitemap.xsl"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

${postsXmlString.join('\n')}
</urlset>`.trim();

    return { body: xmlString };
}
Enter fullscreen mode Exit fullscreen mode

Thatā€™s it! Check out the new sitemaps in the browser.

āš½ļø HTTP Headers

Search engines do not need to index the sitemaps, so you can serve a robots noindex directive from these routes. Thereā€™s a few ways to set this up. If you are use SSG (Astro default) and hosting on Cloudflare or Netlify, then to let the host know which headers you want to include, add a public/_headers file to the project:

/page-sitemap.xml
  cache-control: public, max-age=0, must-revalidate
  x-robots-tag: noindex, follow
/post-sitemap.xml
  cache-control: public, max-age=0, must-revalidate
  x-robots-tag: noindex, follow
/sitemap_index.xml
  cache-control: public, max-age=0, must-revalidate
  x-robots-tag: noindex, follow
Enter fullscreen mode Exit fullscreen mode

However if you are running in SSR mode, then you can just include these headers in the Response object which your get function returns.

šŸ™ŒšŸ½ Astro Sitemaps: Wrapping Up

In this post, we saw how to add Astro Sitemaps to your project. In particular, we saw:

  • how to use the Astro sitemap integration for hassle free setup,
  • how you gain more control over the sitemap content using Astro Sitemaps XML resource routes,
  • serving noindex HTTP headers on sitemap routes.

You can see the full code for the project in the Astro Blog Markdown GitHub repo.

Hope you have found this post useful! I am keen to hear what you are doing with Astro and ideas for future projects. Also let me know about any possible improvements to the content above.

šŸ™šŸ½ Astro Sitemaps: Feedback

Have you found the post useful? Would you prefer to see posts on another topic instead? Get in touch with ideas for new posts. Also if you like my writing style, get in touch if I can write some posts for your company site on a consultancy basis. Read on to find ways to get in touch, further below. If you want to support posts similar to this one and can spare a few dollars, euros or pounds, please consider supporting me through Buy me a Coffee.

Finally, feel free to share the post on your social media accounts for all your followers who will find it useful. As well as leaving a comment below, you can get in touch via @askRodney on Twitter, @rodney@toot.community on Mastodon and also the #rodney Element Matrix room. Also, see further ways to get in touch with Rodney Lab. I post regularly on Astro as well as SEO. Also subscribe to the newsletter to keep up-to-date with our latest projects.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .