Hacking Ghost : Adding dynamic sitemap.xml

I love working with Ghost. It doesn't have nearly as many features as Wordpress, but it also doesn't have the baggage. That means that customizing it is incredibly simple and straightforward. Today I'll be adding a dynamic sitemap.xml which is generated from your posts. There is an outstanding github issue for this feature, but it looks like they won't be adding support for it until version 0.6. If you don't want to wait that long, and are comfortable modifying a few files there is no reason we can't get this working now. This walkthrough assumes you have a local copy of the project running and you will be deploying via git or github (something which will do npm install for you)

If you just want to see the diff, you can see it in this commit.

Be aware that if you do a global update of your ghost blog install it will wipe out your changes and you will have to re-implement this functionality. That's the cost of hacking in functionality outside of a plugin ecosystem.

The first step in getting this functionality is to add a node module for generating sitemaps so you don't have to generate the XML yourself. Through a quick search I found the appropriately named sitemap module which we will use for this example. Simply run npm install sitemap --save to get the module and update your package.json file for when you deploy later.

Now there are just two files to modify to add this functionality. The first file we'll modify will add the new route, which will map to the controller where we'll add the actual implementation.

/core/server/routes/frontend.js

    // ### Frontend routes
    server.get('/sitemap.xml', frontend.sitemap); // <- new line
    server.get('/rss/', frontend.rss);
    server.get('/rss/:page/', frontend.rss);

Now the implementation itself goes inside the controller.

/core/server/controllers/frontend.js

Add the reference to the sitemap module

when        = require('when'),
Route       = require('express').Route,
sm          = require('sitemap'), <- get reference to sitemap module

api         = require('../api'),

Add the function which actually creates the sitemap xml.

function buildSitemap(posts, done, sitemap) {
  var sitemap = sitemap || sm.createSitemap ({
    hostname: config().url,
    cacheTime: 600000
  });

  if(posts.length > 0) {
    var post = posts.shift();
    sitemap.add({ url: '/' + post.slug + '/' });
    process.nextTick(buildSitemap.bind(this, posts, done, sitemap));
  } else {
    sitemap.toXML(function(xml) {
      done(xml);
    });
  }
}

Lastly we need to add the function which the route is mapped to. This is added to the frontendControllers object.

frontendControllers = {
  'sitemap': function(req, res, next) {
    api.posts.browse({ staticPages: 'all', limit: 1000 }).then(function(result) {
      buildSitemap(result.posts, function(sitemap) {
        res.header('Content-Type', 'application/xml');
        res.send(sitemap);
      });
    });
  },
  'homepage': function (req, res, next) {

Now if you run git status you should see the files that we've changed.

To test locally, just run npm start and try requesting the sitemap.xml file. You should see something like the following:

Now you have a dynamic sitemap.xml! This code generates the sitemap every time a request is made to /sitemap.xml. In theory we could be caching this and only updating it when there is a change to one of the posts, but that's a micro optimization that isn't necessary for the vast majority of blogs running ghost. Hopefully someone finds this useful until ghost adds official support for sitemaps!

Here is a diff containing all the changes I made above.

Running Express 4 apps on Azure Websites

UPDATE

None of the following is necessary anymore. The Azure team has apparenlty updated their detection script and default web.config file to work with the latest version of express.


A major uprade of Express was released recently, and as often happens it came with a number of breaking changes. Most of these issues are addressed quite nicely in the Express 4 Migration Guide. One issue which isn't covered at all is Express's new entry point. Instead of using /server.js or /app.js like most Node websites (and like the previous versions of Express scaffolding), Express 4 uses /bin/www. This causes a few issues with Azure and IISNode which I cover below. If you're just interested in the fix, you can get the working web.config file from GitHub and drop it in the root of your Express 4 application.

Get the web.config file!

Azure Websites & Node.js

Since Azure Websites uses IISNode to handle requests and doesn't just launch it's own node instance, it cannot use npm start the way some other PaaS providers can. Instead it looks for an /app.js or /server.js file. If it finds one, it will generate a web.config file which contains the settings necessary for IISNode to route web requests to your application.

I understand why they cannot use npm start, but I don't see any reason why they can't use the "main" declaration to determine the application entry point. I have created an issue on their feedback page requesting this feature.

To work around this, we need to supply our own web.config file which tells IISNode to use /bin/www instead of the /app.js it detected. This is pretty straight forward, and there are three sections of the default web.config we'll need to update.

<handlers>
    <add name="iisnode" path="app.js" verb="*" modules="iisnode"/>
</handlers>
<rule name="NodeInspector" patternSyntax="ECMAScript" stopProcessing="true">
    <match url="^app.js\/debug[\/]?" />
</rule>
<rule name="DynamicContent">
  <conditions>
    <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="True"/>
  </conditions>
  <action type="Rewrite" url="app.js"/>
</rule>

Under normal circumstances, that's all you would have to do to get things working. However there is another problem with this setup. For security reasons IIS prevents requests to the following paths by default.

  • web.config
  • bin
  • App_code
  • App_GlobalResources
  • App_LocalResources
  • App_WebReferences
  • App_Data
  • App_Browsers

That means any request to /bin will not be processed. Since our Express 4 entry point is /bin/www, our application still won't work. In this case we just have to tell IIS that it's okay to process requests to /bin, which we can do using the <hiddenSegments> configuration within our web.config file.

<security>
  <requestFiltering>
    <hiddenSegments>
      <remove segment="bin" />
    </hiddenSegments>
  </requestFiltering>
</security>

Finally, here is the finished product which should get your Express 4 application running in Azure Websites.

Make sure to check the GitHub repo for the latest version of the file.

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <system.webServer>

    <!-- 
      By default IIS will block requests going to the bin directory for security reasons. 
      We need to disable this since that's where Express has put the application entry point. 
    -->
    <security>
      <requestFiltering>
        <hiddenSegments>
          <remove segment="bin" />
        </hiddenSegments>
      </requestFiltering>
    </security>

    <handlers>
      <!-- Indicates that the www file is a node.js entry point -->
      <add name="iisnode" path="/bin/www" verb="*" modules="iisnode"/>
    </handlers>
    <rewrite>
      <rules>
        <rule name="NodeInspector" patternSyntax="ECMAScript" stopProcessing="true">
          <match url="^bin\/www\/debug[\/]?" />
        </rule>

        <!-- 
          First we consider whether the incoming URL matches a physical file in the /public folder. 
          This means IIS will handle your static resources, and you don't have to use express.static 
        -->
        <rule name="StaticContent">
          <action type="Rewrite" url="public{REQUEST_URI}"/>
        </rule>

        <!-- All other URLs are mapped to the node.js entry point -->
        <rule name="DynamicContent">
          <conditions>
            <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="True"/>
          </conditions>
          <action type="Rewrite" url="/bin/www"/>
        </rule>
      </rules>
    </rewrite>
  </system.webServer>
</configuration>