Securing ActivityPub
Why this is so hard in Mastodon:
Mastodon was originally built on OStatus. It's Atom/RSS with extra bits. The extra bits are mostly just PuSH, which lets the Atom feed tell you, the subscriber, when it's got new data ( a decent breakdown of PuSH here – https://blog.superfeedr.com/howto-pubsubhubbub/ ). There's an attempt to do access control in places like Diaspora which have their own Mastodon-like API, but in general there's not an expectation of privacy in the ostatus culture. This is totally fine for something like a blog, but the social network many of us envision in the Activity...verse? ...is a kind, thoughtful, nuanced thing. We need more. So let's talk about more.
We have audience control in ActivityPub, and lots of it
An ActivityPub Activity object may have “to”, “bto”, “cc”, “bcc”, and “audience” fields, kind of like email's “to, cc, bcc”.
to
is the public primary audience for the activity. The message is delivered to these users, and anyone who can see the activity can see who's in theto
field.cc
is the public secondary audience for the activity. There's really no spec difference fromto
addressees in functionality or visibility.bto
andbcc
are blind versions of the above. You won't (or shouldn't, at least) ever receive an activity object with abcc
on it unless, maybe, it's you.audience
is enigmatic. Since it's not listed as public or private, it should be considered public, and since it's pretty under-documented we only know that it does trigger delivery of the message to any actors listed.
In addition, if any of these fields contains the special string https://www.w3.org/ns/activitystreams#Public
, the post is public and “shall be available to all users, without authentication”. Small surprise, almost every activity Mastodon serves (including unlisted posts, using cc
!) are addressed to this collection.
And finally, Collections are valid addressees. A followers collection, like https://example.com/alice/followers
, is the popular one, but there's no reason that we can't also have Circles (like google), channels, instance-specific groups, and so on.
We have another tool at our disposal, which is the very vaguary in the spec that's driving us all insane. The W3C seems to be aware that over-specifying at this early stage could stifle innovation. We could take the opportunity to build clean conventions that clearly describe where data can and cannot go, enfoce this on our instances, and fold them into future W3C reccommendations.
Let's stop volunteering data
The biggest problem is that public collection. Unless we want to break with the AP spec, we can't both address to public, and control access to posts. Let's stop using it except for when we really, really mean it.
That doesn't mean we can't have public... well, “mostly-public” posts. Set up a local timeline collection for your instance, and address to it for locally “listed” posts. At that point you're delivering to any tagged user, plus the local timeline. A good start.
If we can learn about other instances' timeline collections, from either nodeinfo
or mastodon's instance
endpoint, we could potentially name them as cc
targets as well, making posts more public but still legal to access-control.
“Strong federation” – using allow-lists and block-lists
Which brings me to strong federation. It seems like madness to me that we don't have the ability to say, as an instance, “these other instances are aligned with our ideals and moderation policies”, and create an addressable meta-collection for “public to all the people we care about”, which you must be authenticated to do object fetches for. I suggest /collections/strong_federation
or something similar.
We could have a special almost-public meta-collection defined that refers to everyone known and not blocklisted, and which you have to be authenticated to do object fetches for. I suggest /collections/no_nazis
just to drive the point home, but in a pinch /collections/almost_public
will do.
Like I said earlier, maybe an individual user wants to create lists or groups too – the now-defunct Google+ had “Circles”, which appeals to me as a concept: You put people in a Circle based on their relationship to you. They don't see the fact that they're added to a Circle, or what that Circle is called. Activities addressed to the Circle effectively bcc
its members. And don't forget, ActivityPub has relationship objects, whose object
can be a Collection. Public and private groups are still very much intended behavior.
Data ownership
At some level, all these strategies I'm describing are about giving instance operators and individual users control over their data. And we should have control. It's our data, and there are definitely threats out there that would use it in ways we don't approve of. So who owns an Activity? What do I mean by ownership?
In these cases I ask myself, “who should have the authority to delete or reject this Activity?”
To me that means:
- For activities that are not replies, not posted to any particular collection, or posted to the public collection, the poster should own the Activity.
- For replies to posts or threads, the original poster should have ownership.
- For messages posted to a collection, such as a followers list or a user's wall, the owner of the collection.
- For private or direct messages to a single person, the recipient
- For private messages directed to a collection, the collection's owner (the recipient)
In the vast majority of cases, this means you own your post. But reply-guys and harrassers can and should get deleted or rejected.
Likewise, in cases where we don't have data ownership, we should think seriously about just... not caching that data. Authority over that data has been handed to the relevant party once the data's sent. Let it go.
Authenticating object fetches
Once we draw down the use of the AS public
group, which mandates unauthenticated access, we can safely get very aggressive about requiring credentials for object fetches. I'd love to see some standards body (* cough *, florence, * cough *) draft some conventions on the nuts and bolts for this.
Other solutions: OCAP-LD
(https://w3c-ccg.github.io/ocap-ld/)
Another approach suggested by the good folks over at Pleroma is OCAP-LD. When you fetch an object, you present cryptographic proof of a “capability”. This capability might be delegated from a more general source of authority, and might come with specific caveats or restrictions. It can also be delegated, sub-delegated, etc., and revoked at any delegation level. It's an interesting read, and they suggest they've got some compelling advantages over access control lists.
I think it could be a part of the solution, but Public and Unlisted will seriously undercut our ability to engage systems like these.
The future is bright
We have so many tools to improve our platform. We have the technology, and a mountain of security experience from email, moderation experience (and in some cases, woes) from usenet and forums, and unfortunately in a lot of cases experience dealing with harrassment and violence. Let's keep sharing that experience, learning from each other, and making this place better. Thanks!