Publishing a blog on ActivityPub, the hard way

This site has been up for at least 15 years now, and for all this time the blog part has been pretty much dead; in fact, I recently deleted some posts about projects that never saw the light of day and weren’t even brag-worthy anymore. Time to do something about it, I thought; unfortunately, I find building infra much more interesting than writing articles, and so I decided that first I’ll make it easy for people to subscribe to the blog and write something interesting some other day, surely.

Now, this is a static site built with Hugo, which comes with Atom templates out of the box; of course, the feedverse isn’t quite what it used to be in the heyday of Google Reader (never forget), but as people are growing more dissatisfied with the walled garden model of large platforms, the venerable self-hosted site seems to be undergoing a renaissance of sorts. (Which is why I’m being stubborn and doing this myself when simply using Medium/Substack/whatever would’ve sufficed; we don’t do this because it’s easy.)

But more is more, and ActivityPub looked like an interesting addition to Atom to explore.

Staticn’t

The thing is, you quickly discover that, unlike with RSS/Atom, ActivityPub cannot be implemented with just a bunch of JSON in an S3 bucket. A crucial part of the protocol is the inbox endpoint which needs to serve dynamic requests and must be functional for anybody to even subscribe to your updates, not to mention interacting with them. Additionally, the updates themselves are expected to be actively broadcast to followers; while the outbox endpoint technically could be used as a polling fallback, it would quickly become impractical, and is not commonly done.

Prior art

I found two examples of working around this by keeping things as static as possible and using serverless tech for the dynamic pieces:

While I did plunder them liberally for inspiration and snippets, eventually I went in a different direction. Serverless is not an important factor for me; I already run this site on a full-blown VM and it works well enough not to bother changing. Instead, I decided that since some parts of the solution require dynamic logic, I might as well go all in and build an actual server that speaks the ActivityPub protocol natively.

I should also mention that there are some generic RSS-to-ActivityPub bridging projects (dariusk/rss-to-activitypub, mastofeeder). By the time I discovered them I had already committed to the idea of doing this myself, and running my own code on my own infra would let me build everything precisely in the way I wanted it instead of adjusting to other people’s decisions, but if you’re also considering bridging your blog and just want a quick off-the-shelf solution, they’re worth looking at.

Of course, there’s also the simple option of a cron job that polls the feed and posts updates as a regular Mastodon account; if you hate fun, that is.

The problem statement

I approached the project with the following in mind:

  • I have a standalone server running Nginx that I fully control
  • I want the ActivityPub actors to be hosted on the ulfurinn.net domain because I’ve grown somewhat attached to it
  • I want all resource URLs to be accessible directly in the browser in a human-friendly format, as well as over the federation protocol, based on accept or content-type, the same way Mastodon does it
  • I want to use Hugo as much as I can for processing Markdown
  • I want to be able to render replies and likes directly on the post page, Disqus-like, either statically or dynamically
  • I am currently most comfortable writing code in Elixir
  • Elixir apps are pretty easy to deploy as Docker images
  • I have a small k8s setup I’ve been meaning to play with

The outcome

In hindsight, was this overkill? Absolutely. But would I do it again? Obviously.

sequenceDiagram
  participant federation
  participant browser
  participant nginx
  participant co-server
  
  co-server->>nginx: poll Atom feed for new posts
  nginx-->>co-server: 
  note over co-server,nginx: periodic
  
  co-server->>nginx: GET /path/to/post
accept: application/x-blogpub-partial nginx-->>co-server: serve static partial JSON co-server->>co-server: store in DB note over co-server,nginx: for each new post browser->>nginx: GET /path/to/post
accept: text/html nginx-->>browser: serve static HTML federation->>nginx: GET /path/to/post
accept: application/activity+json nginx->>co-server: proxy co-server-->>federation: serve dynamically built APub JSON federation->>nginx: GET /actor
POST /inbox
accept: application/activity+json nginx->>co-server: proxy co-server-->>federation: process request and serve APub JSON

The server part consists of Nginx with some special config and a co-server implementing ActivityPub (well, the required parts of it). The two depend on each other, which may make the setup not so easily portable outside of my own installation, though this can be improved if there’s interest.

The full source code is not currently public because there are shameful sins being committed, but I’ll publish it once I iron out the kinks.

Hugo setup

I liked Paul Kinlan’s idea of generating ActivityPub objects with Hugo templates; however, instead of full articles and the outbox feed, I only build partial objects that the co-server will consume and decorate. This lets me use Hugo for Markdown processing, but the complete ActivityPub resources will be dynamically generated by the co-server.

The single.ajson template:

{{
  dict
  "id" .Permalink
  "url" .Permalink
  "type" "Article"
  "name" .Title
  "summary" (.Summary | transform.Plainify | transform.HTMLUnescape | replaceRE `\n+` "\n" | replaceRE `\n$` "")
  "content" (.Content | safe.HTML | replaceRE `\n+` "\n" | replaceRE `\n$` "")
  "published" (dateFormat "2006-01-02T15:04:05-07:00" .Date)
  |
  jsonify
-}}

The template has slight variations for the three feed types I have, which map to ActivityPub object types Article, Page, and Note. The differences are in field semantics; e.g., notes don’t need summaries because their main content is expected to be short enough, and pages have their URLs pointing to external resources.

main.toml:

[mediaTypes]
  [mediaTypes."application/activity+json"]
    suffixes = ["ajson"]

[outputFormats]
  [outputFormats.apub]
    mediaType = "application/activity+json"
    notAlternative = true
    baseName = ""

This will tell Hugo to produce an .ajson file next to the normal index.html.

Finally, each post needs to specify outputs: [html, apub] in its frontmatter. I tried setting it up globally in the config, but couldn’t figure out how to make Hugo ignore the APub output for other site sections. This is not a huge problem because it can be included in the artefact file, which will ensure it’s added to all new source files.

Nginx setup

There are two things that need to be set up: content negotiation based on the accept header, and forward proxying to the co-server.

Each article exists in three different versions:

  • normal HTML, produced by Hugo
  • a standard APub document, produced by the co-server
  • an APub partial, produced by Hugo as shown above

This checks HTTP headers to set some variables used for routing:

map $http_accept $accept {
  default "";
  "~.*application/activity\+json.*" "apub";
  "~.*application/ld\+json; profile=\"https://www.w3.org/ns/activitystreams\"" "apub";
  "~.*application/x-blogpub-partial.*" "blogpub";
}
map $http_content_type $content {
  default "";
  "~.*application/activity\+json.*" "apub";
  "~.*application/ld\+json; profile=\"https://www.w3.org/ns/activitystreams\"" "apub";
}

map "$accept;$content" $apub {
  default "";
  "~\bapub\b" "1";
}

This sets up the routing (standard Nginx bits abridged):

server {
        location / {
                proxy_pass_header Date;
                proxy_set_header X-Forwarded-Host $http_host;

                if ($apub) {
                        # any APub traffic gets forwarded
                        proxy_pass https://(co-server URL);
                        break;
                }
        }

        # Webfinger is also forwarded
        location = /.well-known/webfinger {
                proxy_pass https://(co-server URL);
        }

        # partials get rendered from a static file
        if ($accept = "blogpub") {
                rewrite ^(/.*)$ $1/.ajson last;
        }
}

(Am I doing something silly here? Probably. Never quite figured out Nginx configuration.)

Future work

The functional bits of APub are:

  • outbox
  • direct post access
  • actor profile
  • accepting followers
  • receiving replies, likes, and boosts

What’s still missing from the wishlist is feeding APub state back to the static page. Hugo can pull arbitrary JSON from arbitrary URLs at build time, so it’s fairly straightforward to add. The only thing I’ve done as a proof of concept is show the current number of followers.

This requires periodic rebuilds, of course. The site is currently regenerated nightly to enable Hugo features like delayed and expired publications; it’s not a problem to do it more frequently if needed.

Updates are currently not implemented for posts, so I need to proof-read quite carefully for now.

There are also trickier bits of APub like object signatures; the current state of affairs is quite messy when it comes to different implementations and their interop. Since the blog actors are passive, in the sense that they don’t interact with anyone apart from broadcasting new posts, I’m hoping this won’t be necessary to build.