Decentralised Social Media Explained: Inside Bluesky's Federated Architecture

On what makes a system decentralised and federated, Bluesky's current architecture and AT Protocol

Dec 13, 2024

This article is part of “Under the Hood”, a fairly new article series where I aim to deep dive into how real world systems work. Checkout the latest ones in this series: From Pixels to Information: A Comprehensive Guide on QR Codes and How to Cron: Advanced Data Structures for Discrete Event Simulation.

This time, we’ll have a look at how Bluesky is setup and go through the novel approaches they use to solve certain challenges. As of December 2025, Bluesky has crossed the 20 million users mark while being maintained by no more than 15 engineers. Martin Kleppman, author of the “Designing Data Intensive Applications”, currently a technical advisor, published a paper1 outlining how Bluesky is functioning as a federated and decentralised social media platform. Gergely Orosz's has two good articles covering the company’s early days, culture and evolutionary architecture if you would like to check them out.2 In the meantime, let’s start with the basics.

Centralised, decentralised and federated

Decentralised networks have no central authority or server, and all participants (nodes) are equal and have the same capabilities. Data and control are distributed across all nodes. If one node fails, the network continues to function. One example of a fully decentralised network is Bitcoin - no single entity controls it, and all participants maintain a copy of the ledger.

Federated networks have multiple independent servers that can communicate with each other, where each server has its own administrator and user base. Users on different servers can interact with each other. Each server follows its own rules while adhering to a common protocol. For example, email providers (gmail, yahoo, and other providers) run their own servers but can communicate with each other via standard protocols.

Comparison between the three types of networks: centralised, decentralised and federated. Red nodes represent servers or hubs. Green nodes represent users. Solid lines show direct connections. Dashed lines show federation between servers.

A system can be both federated and decentralised. This is often called a federated decentralised network. The federation part provides standard protocols for communication, interoperability between different servers/instances and local administration and rules. The decentralisation part ensures no central authority controls the entire network, anyone can set up their own server/instance and there is no single point of failure.

Example of how a federated decentralised network would look like. Each cluster represents a federation of servers. Each federation has multiple servers (red nodes). Users (green nodes) are connected to different servers. Solid lines show direct connections within federations. Dashed lines aka standard protocols show federation connections between groups.

In which category should X/Mastodon be part of?

X maintains complete centralised control over its platform. They have exclusive authority over your personal data, your identity, and content management (which includes both moderation policies and feed generation). While X has taken steps toward transparency by partially open-sourcing their algorithm3, users remain largely passive observers - able to understand parts of how the system works but with minimal ability to influence or modify its operation.

Mastodon is federated because different servers can communicate with each other using the ActivityPub protocol. It's partially decentralised because no one owns or controls the entire network and anyone can set up their own Mastodon server. If one server goes down, the rest of the network continues functioning. The key difference from purely decentralised systems (like Bitcoin) is that these networks maintain some organisational structure through federation while still preventing central control. They strike a balance between complete decentralisation and practical usability. One disadvantage though, is that in the same way there are huge email providers, there are big Mastodon servers. Your user identity becomes tied to a server and it might get challenging (if not impossible) to move to another server.

Bluesky architecture high-level overview

Bluesky uses a protocol called the AT Protocol (Authenticated Transfer Protocol)4 aka atproto, which works similar to email in some ways. Just like how email allows you to use Gmail but still message someone with a Yahoo account, Bluesky lets you choose different providers (called Personal Data Servers or PDS) to host your account, communicate with users on different servers, and take your account and data with you if you want to switch providers.

The key aspects through which Bluesky encourages decentralisation is that anyone can run their own server and connect to the network, users have data portability (can move their account between providers) and the protocol is open, meaning anyone can build compatible apps or services.

Let’s go through the main components that make the Bluesky’s architecture.

We have the PDS (Personal Data Server) which is the central part of the architecture. It hosts user repositories and data, manages user identity, orchestrates requests to other services, handles data sync and federation, can be self-hosted or provided (like Bluesky PDS). User Data Repository holds the User Identity (DID), which is the decentralised identifier for authentication. It handles the cryptographic authentication, signing and verification and holds the records structure like posts, follows, likes and profile data. In order to consume data, aggregate it to your liking and distribute it in real time, you have the Network Relays which index and collect content from PDSs and then use Firehose to distribute it so that you can process updates in real time. The firehose data is consumed by the Feed Generator (Discover) to create custom content feeds and by the Labeling Service (Ozone) to provide content moderation, tagging and content classification. The data from those two Opinionated Services is consumed by the App Views, which acts as the user-facing platform.

High level overview of Bluesky’s architecture

Interoperation and federability

The protocol uses a system called "Lexicons", these are like agreed-upon vocabularies that define how different types of data (posts, likes, follows, etc.) should be structured. This standardization allows different servers and clients to understand each other, similar to how different email clients all understand what an email header should look like.

Each server supports specific features by implementing "lexicons" - there are core ones (starting with com.atproto.*) that handle basic stuff like syncing user data, and social-focused ones (starting with app.bsky.*) that enable social media features.

Unlike the regular web, which shares HTML pages that determine both content and appearance, the AT Protocol only shares the underlying data in a structured format. This means apps can display the same information in their own unique ways, without needing to download and run someone else's display code (HTML/JS/CSS). It's like getting pure information that each app can then present however it wants. 5

The user data repository

A noteworthy feature is repositories ("repos"). Each user has their own personal repository that contains all their data - posts, follows, likes, etc. These repos are designed to be efficiently synced between servers, maintain a verifiable history of changes and allow users to take their entire social presence with them if they switch providers.

The AT Protocol (ATP) has some interesting technical characteristics that enable decentralisation. The core of ATP is built around something called "Self-Authenticating Data Structures". Every post, profile, and interaction is signed cryptographically, which means data can be verified as authentic regardless of which server it comes from, users truly own their identity through cryptographic keys and content can't be tampered with without detection.

An interesting technical choice is that ATP uses DIDs (Decentralised Identifiers) as the foundation for user identity, which are persistent identifiers that remain valid even if you change providers.

Every user has two identifiers:

A human-readable handle (like username.bsky.social or custom domain)
A permanent DID (Decentralised Identifier, like did:plc:xyz123...)

User Handle (e.g., alice.bsky.social)
       ↓
Resolves to DID (did:plc:xyz123...)
       ↓
DID looks up in PLC server
       ↓
Returns DID Document containing:
- Public key
- PDS server location
- Handle verification

Users can change their handle without losing their identity because the DID remains constant. The DID document points back to your handle, creating a verification loop. Any changes to the DID document must be signed with your private key and create a verifiable chain of updates. All updates require cryptographic signatures. Changes create a verifiable chain (like a "mini-blockchain"). Public data is self-verifiable. Third parties can detect manipulation.

Some limitations with the current approach are that it relies on centralised DNS system, there’s currently a single PLC server managed by Bluesky and most users' private keys are currently managed by Bluesky.

How decentralised and federated is Bluesky actually?

Bluesky doesn't use a traditional message-passing system like other federated networks (ActivityPub, XMPP, email). Instead, it uses a "shared heap" architecture where data exists in one global pool. There's no direct message-passing between hosts, which is a common federation pattern. Most users are currently hosted on Bluesky's PDS (Personal Data Server) instances. The chat/DMs service is completely centralised and most services rely on Bluesky's relay as a firehose.

The platform is not yet fully decentralised as the power is not evenly diffused throughout the system, most users run the Bluesky app, most developers work with Bluesky-defined application schemas (Lexicons), most self-hosted users run Bluesky's PDS software. Running a full-network relay requires significant resources (16TB of fast NVMe disk). Most DID PLC accounts don't have independently controlled PLC rotation keys configured.

One thing I would not debate is that much of the protocol and the network is reliant on Bluesky in material terms today. Almost everybody in the atproto network is hosted on a Bluesky PDS instance. Most self-hosted folks run the Bluesky PDS software. Most services use a Bluesky relay as a firehose. Most users run the Bluesky app. Most developers are working with the Bluesky-defined application schemas (Lexicons). - Bryan Newbold, Protocol Engineer at Bluesky 6

While Bluesky's underlying protocol (atproto) is designed to support decentralisation, in practice most of the network currently runs through Bluesky the company. Almost everyone uses Bluesky's servers, app, and relay services - making it functionally similar to Twitter/X today. The key difference is in the architecture.

It allows for "credible exit", users can leave without losing their data or connections. It Supports independent Lexicons (like the whtwnd.com blogging platform). Every major infrastructure component can be substituted without friction. Users can self-host their own instances. Lets users own their identity and data. The protocol is designed to prevent any single party from moderating the entire network.

So, unlike Twitter, Bluesky's infrastructure is built to allow users and developers to run their own independent services, migrate their data, and participate in the network without relying on Bluesky's systems. This capability exists even though relatively few are using it right now.

Bluesky and the AT Protocol: Usable Decentralised Social Media

Building Bluesky: a Distributed Social Network (Real-World Engineering Challenges)

X’s Recommendation Algorithm

AT Protocol Documentation

AT Protocol Interoperation Overview Guide

Bryan Newbold’s Reply on Bluesky and Decentralisation

The Engineering Compass

Discussion about this post