The more I read and consider Bluesky and this protocol, the more pointless -- and perhaps DANGEROUS -- I find the idea.
It really feels like no one is addressing the elephant in the room of; okay, someone who makes something like this is interested in "decentralized" or otherwise bottom-up ish levels of control.
Good goal. But then, when you build something like this, you're actually helping build a perfect decentralized surveillance record.
This why I say that most of Mastodon's limitations and bugs in this regard (by leaving everything to the "servers") are actually features. The ability to forget and delete et al is actually important, and this makes that HARDER.
I'm just kind of like, JUST DO MASTODONS MODEL, like email. It's better and the kinks are more well thought about and/or solved.
When it comes to the internet, tech is law. There is no way to publicly share something and maintain control over it. Even on the Fediverse, if either a client or server wants to ignore part of the protocol or model, it can. Like a system message to delete particular posts for anti-surveillance reasons can simply be ignored by any servers or clients that were designed/modified for surveillance. Ultimately the buck lies with the owner of some given data to not share that data in the first place if there's a chance of misuse.
Author here. I think it's fair to say that AT protocol's model is "everyone is a scraper", including first party. Which has both bad and good. I share your concern here. For myself, I like the clarity of "treat everything you post as scraped" over "maybe someone is scraping but maybe not" security by obscurity. I also like that there is a way for me to at least guarantee that if I intentionally make something public, it doesn't get captured by the container I posted it into.
This seems like tensions between normal/practical and “opsec” style privacy thinking… Really, we can never be sure anything that gets posted on the internet won’t be captured by somebody outside our control. So, if we want to be full paranoid, we should act like it will be.
But practically lots of people have spent a long time posting their opinions carelessly on the internet. Just protected by the fact that nobody really has (or had) space to back up every post or time to look at them too carefully. The former has probably not been the case for a long time (hard drives are cheap), and the latter is possibly not true anymore in the LLM era.
To some extent maybe we should be acting like everything is being put into a perfect distributed record. Then, the fact that one actually exists should serve as a good reminder of how we ought to think of our communications, right?
Exactly. Anything that's ever been public on the internet is never really gone anyways, and it's unsafe to assume so. This is similar to publishing a website or a blog post. Plus, from a practical (non-opsec) point of view, you can delete items (posts, likes, reposts, etc.) on ATProto, and those items will disappear from whatever ATProto app you are using - usually even live. You need to dive into the protocol layer to still see deleted items.
It's not about the access, it's about the completeness. Imagine this paradigm takes off (I hope it does!), everyone has their own PDS and finally owns their data. Social apps link into their PDS to publish and share data exactly as they're supposed to.
Well now someone's PDS is a truly complete record of their social activity neatly organized for anyone that's
interested. It's not a security issue, after all the data was still public before, but the barrier to entry is now zero. It's so low that you can just go to stalker.io, put in their handle, and it will analyze their profile and will print out a scary accurate timeline of their activity and location leveraging AI's geoguesser skill.
That is how it works, but people shouldn't be posting their location or sensitive information publicly if they don't want it exposed like that. That's basic opsec. Private data is currently being worked on for ATProto and will hopefully begin existing in 2026.
It's true that Mastodon is somewhat better if you don't want to be found, though it's hardly a guarantee. From a "seeing like a state" perspective, Bluesky is more "legible" and that has downsides.
But I think there's room for both models. There are upsides to more legibility too. Sometimes we want to be found. Sometimes we're even engaging in self-promotion.
Also, I'll point out that Hacker News is also very legible. Everything is immutable after the first hour and you can download it. We just live with it.
This is a line of thinking that just supposes we shouldn’t post things on the internet at all. Which, sure, is probably the right move if you’re that concerned about OPSEC, but just because ActivityPub has a flakier model doesn’t mean it isn’t being watched
This article goes into a lot of detail, more than is really needed to get the point across. Much of that could have been moved to an appendix? But it's a great metaphor. Someone should write a user-friendly file browser for PDS's so you can see it for yourself.
I'll add that, like a web server that's just serving up static files, a Bluesky PDS is a public filesystem. Furthermore it's designed to be replicated, like a Git repo. Replicating the data is an inherent part of how Bluesky works. Replication is out of your control. On the bright side, it's an automatic backup.
So, much like with a public git repo, you should be comfortable with the fact that anything you put there is public and will get indexed. Random people could find it in a search. Inevitably, AI will train on it. I believe you can delete stuff from your own PDS but it's effectively on your permanent record. That's just part of the deal.
So, try not to put anything there that you'll regret. The best you could do is pick an alias not associated with your real name and try to use good opsec, but that's perilous.
My goal with writing is generally to move things out of my head in the shape that they existed in my head. If it's useful but too long, I trust other people to pick what they find valuable, riff on it, and so on.
>Someone should write a user-friendly file browser for PDS's so you can see it for yourself.
I've been thinking of this for some time, conceptually, but perhaps from a more fundamental angle. I think the idea of "files" is pretty dated and can be thrown out. Treat everything as data blobs (inspired by PerKeep[0]) addressed by their hashes and many of the issues described in the article just aren't even a thing. If it really makes sense, or for compatibility sake, relevant blobs can be exposed through a filesystem abstraction.
Also, users don't really want apps. What users want are capabilities. So not Bluesky, or YouTube for example, but the capability to easily share a life update with interested parties, or the capability to access yoga tutorial videos. The primary issue with apps of that they bundle capabilities, but many times particular combinations of capabilities are desired, which would do well to be wired together.
Something in particular that's been popping up fairly often for me is I'm in a messaging app, and I'd like to lookup certain words in some of the messages, then perhaps share something relevant from it. Currently I have to copy those words over to a browser app for that lookup, then copy content and/or URL and return to the messaging app to share. What I'd really love is the capability to do lookups in the same window that I'm chatting with others. Like it'd be awesome if I could embed browser controls alongside the message bubbles with the lookup material, and optionally make some of those controls directly accessible to the other part(y|ies), which may even potentially lead to some kind of adhoc content collaboration as they make their own updates.
It's time to break down all these barriers that keep us from creating personalized workflows on demand. Both at the intra-device level where apps dominate, and at the inter-device level where API'd services do.
I'm using filesystem more as a metaphor than literally.
I picked this metaphor because "apps" are many-to-many to "file formats". I found "file format" to be a very powerful analogy for lexicons so I kind of built everything else in the explanation around that.
The repository data structure is content-addressed (a Merkle-tree), and every mutation of repository contents (eg, addition, removal, and updates to records) results in a new commit data hash value (CID). Commits are cryptographically signed, with rotatable signing keys, which allows recursive validation of content as a whole or in part. Repositories and their contents are canonically stored in binary DAG-CBOR format, as a graph of data objects referencing each other by content hash (CID Links). Large binary blobs are not stored directly in repositories, though they are referenced by hash (CID).
Re: apps, I'd say AT is actually post-app to some extent because Lexicons aren't 1:1 to apps. You can share Lexicons between apps and I totally can see a future where the boundaries are blurring and it's something closer to what you're describing.
I've always thought walled gardens are the effect of consumer preferences, not the cause.
The effect of the internet (everything open to everyone) was to create smaller pockets around a specific idea or culture. Just like you have group chats with different people, thats what IG and Snap are. Segmentation all the way down.
I am so happy that my IG posts arent available on my HN or that my IG posts arent being easily cross posted to a service I dont want to use like truth social. If you want it to be open, just post it to the web.
I think I don't really understand the benefit of data portability in the situation. It feels like in crypto when people said I want to use my Pokemon in game item in Counterstrike (or any game) like, how and why would that even be valuable without the context? Same with a Snap post on HN or a HN post on some yet-to-be-created service.
>I am so happy that my IG posts arent available on my HN or that my IG posts arent being easily cross posted to a service I dont want to use like truth social.
ATProto apps don't automatically work like this and don't support all types of "files" by default. The app's creator has to built support for a specific "file type". My app https://anisota.net supports both Bluesky "files" and Leaflet "files", so my users can see Bluesky posts, Leaflet posts, and Anisota posts. But this is because I've designed it that way.
Anyone can make a frontend that displays the contents of users PDSs.
I also have a little side project called Aturi that helps provide "universal links" so that you can open ATProto-based content on the client/frontend of your choice: https://aturi.to/anisota.net
> I think I don't really understand the benefit of data portability in the situation.
Twitter was my home on the web for almost 15 years when it got taken over by a ... - well you know the story. At the time I wished I could have taken my identity, my posts, my likes, and my entire social graph over to a compatible app that was run by decent people. Instead, I had to start completely new. But with ATProto, you can do exactly that - someone else can just fork the entire app, and you can keep your identity, your posts, your likes, your social graph. It all just transfers over, as long as the other app is using the same ATProto lexicon (so it's basically the same kind of app).
I agree. I don't understand the driving force here.
I have all of the raw image files that I've uploaded to Instagram. I can screenshot or download the versions that I created in their editor. Likewise for any text I've published anywhere. I prefer this arrangement, where I have the raw data in my personal filesystem and I (to an extent) choose which projections of it are published where on the internet. An IG follow or HN upvote has zero value to me outside of that platform. I don't feel like I want this stuff aggregated in weird ways that I don't know about.
>The effect of the internet (everything open to everyone) was to create smaller pockets around a specific idea or culture. Just like you have group chats with different people, thats what IG and Snap are. Segmentation all the way down.
I actually agree with that. See from the post:
>For some use cases, like cross-site syndication, a standard-ish jointly governed lexicon makes sense. For other cases, you really want the app to be in charge. It’s actually good that different products can disagree about what a post is! Different products, different vibes. We’d want to support that, not to fight it.
AT doesn't make posts from one app appear in all apps by default, or anything like that. It just makes it possible for products to interoperate where that makes sense. It is up to whoever's designing the products to decide which data from the network to show. E.g. HN would have no reason to show Instagram posts. However, if I'm making my own aggregator app, I might want to process HN stuff together with Reddit stuff. AT gives me that ability.
To give you a concrete example where this makes sense. Leaflet (https://leaflet.pub/) is a macroblogging platform, but it ingests Bluesky posts to keep track of quotes from the Leaflets on the network, and display those quotes in a Leaflet's sidebar. This didn't require Leaflet and Bluesky to collaborate, it's just naturally possible.
Another reason to support this is that it allows products to be "forked" when someone is motivated enough. Since data is on the open network, nothing is stopping from a product fork from being perfectly interoperable with the original network (meaning it both sees "original" data and can contribute to it). So the fork doesn't have to solve the "convince everyone to move" problem, it just needs to be good enough to be worth running and growing organically. This makes the space much more competitive. To give an example, Blacksky is a fork of Bluesky that takes different moderation decisions (https://bsky.app/profile/rude1.blacksky.team/post/3mcozwdhjo...) but remains interoperable with the network.
I know this is somewhat covered in another comment, but, the concepts described in the post could have been reduced quite a bit, no offense Dan. While I like the writing generally, I would consider writing and then letting it sit for a few days, rereading, and then cutting chaff (editing). This feels like a great first draft but without feedback, and could have greatly benefited from an editing process, and I think using the argument that you want to put out something for others to take and refine isn’t really a strong one… a bit more time and refinement could have made a big difference here (and given you have a decently sized audience I would keep in mind).
remoteStorage is still occasionally getting updates. https://solidproject.org is a somewhat newer, similar project backed by Tim Berners-Lee. (With its own baggage.)
I think of those projects as working relatively well for private data, but public data is kinda awkward. ATProto is the other way around: it has a lot of infra to make public data feasible, but private data is still pretty awkward.
It's a lot more popular though, so maybe has a bigger chance of solving those issues? Alternatively, Bluesky keeps its own extensions for that, and starts walling those bits off more and more as the VCs amp up the pressure. That said, I know very little about Bluesky, so this speculation might all be nonsense.
remoteStorage seems aimed at apps that don't aggregate data across users.
AT aims to solve aggregation, which is when many users own their own data, but what you want to display is something computed from many of them. Like social media or even HN itself.
This was a nice intro to AT (though I feel it could have been a bit shorter)
The whole things seems a bit over engineered with poor separation of concerns.
It feels like it'd be smarter to flatten the design and embed everything in the Records. And then other layers can be built on top of that
Making every record includes the author's public-key (or signature?). Anything you need to point at you'd either just give its hash, or hash + author-public-key. This way you completely eliminate this goofy filesystem hierarchy. Everything else is embed it in the Record.
Lexicons/Collections are just a field in the Record. Reverse looking up the hash to find what it is, also a separate problem.
Yes. SSB and ANProto do this. We actually can simply link to a hash of a pubkey+signature which opens to a timestamped hashlink to a record. Everything is a hash lookup this way and thus all nodes can store data.
{:record {:person-key **public-key**
:type :twitter-post
:message "My friend {:person-key **danabramov-public-key**} suggested I make this on this HN post {:link **record-of-hn-post-hash**}. Look at his latest post {:link **danabramov-newtwitter-post-hash** :person-key **danabramov-public-key**} it's very cool!"}
:hash **hash-of-the-record**
:signature **signature-by-author**}
So everything is self-contained. The other features you'd build on top of this basic primitive
- Getting the @danabramov username would be done by having some lookup service that does person-key->username. You could have several. Usernames can be changed with the service.. But you can have your own map if you want, or infer it from github commits :)) There are some interesting ideas about usernames about. How this is done isn't specified by the Record
- Lexicon is also done separately. This is some validation step that's either done by a consumer app/editor of the record or by a server which distributes records (could be based on the :type or something else). Such a server can check if you have less than 300 graphemes and reject the record if it fails. How this is done isn't specified by the Record
- Collection.. This I think is just organizational? How this is done isn't specified by the Record. It's just aggregating all records of the same type from the same author I guess?
- Hashes.. they can point at anything. You can point at a webpage or an image or another record (where you can indicate the author). For dynamic content you'd need to point at webpage that points at a static URL which has the dynamic content. You'd also need to have a hash->content mapping. How this is done isn't specified by the Record
This kind of setup makes the Record completely decoupled from rest of the "stack". It becomes much more of independent moveable "file" (in the original sense that you have at the top) than the interconnected setup you end up with at the end.
- How do you rotate keys? In AT, the user updates the identity document. That doesn't break their old identity or links.
- When you have a link, how do you find its content? In AT, the URL has identity, which resolves to hosting, which you can ask for stuff.
- When aggregating, how do you find all records an application can understand? E.g. how would Bluesky keep track of "Bluesky posts". Does it validate every record just in case? Is there some convention or grouping?
Btw, you might enjoy https://nostr.com/, it seems closer to what you're describing!
This, Local-first Software [1], the Humane Web Manifesto [2], etc. make me optimistic that we're moving away from the era of "you are the product" dystopian enshittification to a more user-centric world. Here's hoping.
Bluesky is not huge, but 40M users is not nothing either. You don't get people to want this, you just try to build better products. The hope is that this enables us all to build better products by making them more interoperable by default. Whether this pans out remains to be seen.
AT Proto seems very overengineered. We already have websites with RSS feeds, which more or less covers the publishing end in a way far more distributed and reliable than what AT offers. Then all you need is a kind of indexer to provide people with notifications and discovery and you're done. But I suppose you can't sell that to shareholders because real decentralised technology probably isn't going to turn as much of a profit as a Twitter knockoff with a vague decentralised vibe to it that most users don't understand or care about.
I'd say some of the worldview is shared but the architecture and ethos is very different. Some major differences:
- AT tries to solve aggregation of public data first. I.e. it has to be able to express modern social media. Bluesky is a proof that it would work in production. AFAIK, Solid doesn't try to solve aggregation, and is focused on private data first. (AT plans private data support but not now.)
- AT embraces "apps describe on their own formats" (Lexicons). Solid uses RDF which is a very different model. My impression is RDF may be more powerful but is a lot more abstract. Lexicon is more or less like *.d.ts for JSON.
agree! Social-media contributions as files on your system: owned by you, served to the app. Like .svg specifications allows editing in inkscape or illustrator a post on my computer would be portable on mastodon or bluesky or a fully distributed p2p network.
As someone who explicitly designed social protocols since 2011, who met Tim Berners-Lee and his team when they were building SOLID (before he left MIT and got funded to turn it into a for-profit Inrupt) I can tell you that files are NOT the best approach. (And neither is SPARQL by the way, Tim :) SOLID was publishing ACLs for example as web resources. Presumably you’d manage all this with CalDAV-type semantics.
But one good thing did come out of that effort. Dmitri Zagidulin, the chief architect on the team, worked hard at the W3C to get departments together to create the DID standard (decentralized IDs) which were then used in everything from Sidetree Protocol (thanks Dan Buchner for spearheading that) to Jack Dorsey’s “Web5”.
Having said all this… what protocol is better for social? Feeds. Who owns the feeds? Well that depends on what politics you want. Think dat / hypercore / holepunch (same thing). SLEEP protocol is used in that ecosystem to sync feeds. Or remember scuttlebutt? Stuff like that.
Multi-writer feeds were hard to do and abandoned in hypercore but you can layer them on top of single-writer. That’s where you get info join ownership and consensus.
ps: Dan, if you read this, visit my profile and reach out. I would love to have a discussion, either privately or publicly, about these protocols. I am a huge believer in decentralized social networking and build systems that reach millions of community leaders in over 100 countries. Most people don’t know who I am and I’m happy w that. Occasionally I have people on my channel to discuss distributed social networking and its implications. Here are a few:
Yes, which is why by default, key management is done by your hosting. You log into your host with login/password or whatever mechanism your host supports.
Adding your own emergency rotational key in case your hosting goes rogue is supported, but is a separate thing and not required for normal usage. I'd like this to be more ergonomical though.
I was hoping this was literally just going to be some safe version of a BBS/Usenet sort of filesharing that was peer-based king of like torrents, but just simple and straightforward, with no porn, infected warez, randomware, crypto-mining, racist/terrorist/nazi/maga/communist/etc. crap, where I could just find old computing magazines, homebrew games, recipes, and things like that.
yeah yeah yeah, everyone get on the AT protocol, so that the bluesky org can quickly get all of these filthy users off of their own servers (which costs money) while still maintaining the original, largest, and currently only portal to actually publish the content (which makes money[0]). let them profit from a technical "innovation" that is 6 levels of indirection to mimic activity pub.
if they were decent people, that would be one thing. but if they're going to be poisoned with the same faux-libertarian horseshit that strangled twitter, I don't see any value in supporting their protocol. there's always another protocol.
but assuming I was willing to play ball and support this protocol, they STILL haven't solved the actual problem that no one else is solving either: your data exists somewhere else. until there's a server that I can bring home and plug in with setup I can do using my TV's remote, you're not going to be able to move most people to "private" data storage. you're just going to change which massive organization is exploiting them.
I know, I know: hardware is a bitch and the type of device I'm even pitching seems like a costly boondoggle. but that's the business, and if you're not addressing it, you're not fomenting real change; you're patting yourself on the back for pretending we can algorithm ourselves out of late-stage capitalism.
>that the bluesky org can quickly get all of these filthy users off of their own servers (which costs money)
That's not correct, actually hosting user data is cheap. Most users' repos are tiny. Bluesky doesn't save anything by having someone move to their own PDS.
What's expensive is stuff like video processing and large scale aggregation. Which has to be done regardless of where the user is hosting their data.
> until there's a server that I can bring home and plug in with setup I can do using my TV's remote, you're not going to be able to move most people to "private" data storage
Quite some BSky users are publishing on their own PDS (Personal Data Server) right now. They have been for a while. There are already projects that automate moving or backign up your PDS data from BSky, like https://pdsmoover.com/
Microblogging is also the least interesting part of the ATProto ecosystem. I've switched all my git hosting over to https://tangled.org and am loving it, not least of which is that my git server (a 'knot' in Tangled parlance) is under my control as a PDS and has no storage limits!
> When great thinkers think about problems, they start to see patterns. They look at the problem of people sending each other word-processor files, and then they look at the problem of people sending each other spreadsheets, and they realize that there’s a general pattern: sending files. That’s one level of abstraction already. Then they go up one more level: people send files, but web browsers also “send” requests for web pages. And when you think about it, calling a method on an object is like sending a message to an object! It’s the same thing again! Those are all sending operations, so our clever thinker invents a new, higher, broader abstraction called messaging, but now it’s getting really vague and nobody really knows what they’re talking about any more.
Author here! I grew up reading Joel's blog and am familiar with this post. Do you have a more pointed criticism?
I agree something like "hyperlinked JSON" maybe sounds too abstract, but so does "hyperlinked HTML". But I doubt you see web as being vague? This is basically web for data.
It really feels like no one is addressing the elephant in the room of; okay, someone who makes something like this is interested in "decentralized" or otherwise bottom-up ish levels of control.
Good goal. But then, when you build something like this, you're actually helping build a perfect decentralized surveillance record.
This why I say that most of Mastodon's limitations and bugs in this regard (by leaving everything to the "servers") are actually features. The ability to forget and delete et al is actually important, and this makes that HARDER.
I'm just kind of like, JUST DO MASTODONS MODEL, like email. It's better and the kinks are more well thought about and/or solved.
But practically lots of people have spent a long time posting their opinions carelessly on the internet. Just protected by the fact that nobody really has (or had) space to back up every post or time to look at them too carefully. The former has probably not been the case for a long time (hard drives are cheap), and the latter is possibly not true anymore in the LLM era.
To some extent maybe we should be acting like everything is being put into a perfect distributed record. Then, the fact that one actually exists should serve as a good reminder of how we ought to think of our communications, right?
a record of what? Posts I wish to share with the public anyway?
Well now someone's PDS is a truly complete record of their social activity neatly organized for anyone that's interested. It's not a security issue, after all the data was still public before, but the barrier to entry is now zero. It's so low that you can just go to stalker.io, put in their handle, and it will analyze their profile and will print out a scary accurate timeline of their activity and location leveraging AI's geoguesser skill.
But I think there's room for both models. There are upsides to more legibility too. Sometimes we want to be found. Sometimes we're even engaging in self-promotion.
Also, I'll point out that Hacker News is also very legible. Everything is immutable after the first hour and you can download it. We just live with it.
I'll add that, like a web server that's just serving up static files, a Bluesky PDS is a public filesystem. Furthermore it's designed to be replicated, like a Git repo. Replicating the data is an inherent part of how Bluesky works. Replication is out of your control. On the bright side, it's an automatic backup.
So, much like with a public git repo, you should be comfortable with the fact that anything you put there is public and will get indexed. Random people could find it in a search. Inevitably, AI will train on it. I believe you can delete stuff from your own PDS but it's effectively on your permanent record. That's just part of the deal.
So, try not to put anything there that you'll regret. The best you could do is pick an alias not associated with your real name and try to use good opsec, but that's perilous.
>Someone should write a user-friendly file browser for PDS's so you can see it for yourself.
You can skip to the end of the article where I do a few demos: https://overreacted.io/a-social-filesystem/#up-in-the-atmosp.... I suggest a file manager there:
>Open https://pdsls.dev. [...] It’s really like an old school file manager, except for the social stuff.
And yes, the paradigm is essentially "everyone is a scraper".
https://pdsls.dev/ can serve this purpose IMO :) it's a pretty neat app, open source, and is totally client-side
edit: whoops, pdsls is already mentioned at the end of the article
Also, users don't really want apps. What users want are capabilities. So not Bluesky, or YouTube for example, but the capability to easily share a life update with interested parties, or the capability to access yoga tutorial videos. The primary issue with apps of that they bundle capabilities, but many times particular combinations of capabilities are desired, which would do well to be wired together.
Something in particular that's been popping up fairly often for me is I'm in a messaging app, and I'd like to lookup certain words in some of the messages, then perhaps share something relevant from it. Currently I have to copy those words over to a browser app for that lookup, then copy content and/or URL and return to the messaging app to share. What I'd really love is the capability to do lookups in the same window that I'm chatting with others. Like it'd be awesome if I could embed browser controls alongside the message bubbles with the lookup material, and optionally make some of those controls directly accessible to the other part(y|ies), which may even potentially lead to some kind of adhoc content collaboration as they make their own updates.
It's time to break down all these barriers that keep us from creating personalized workflows on demand. Both at the intra-device level where apps dominate, and at the inter-device level where API'd services do.
[0] https://perkeep.org/
I picked this metaphor because "apps" are many-to-many to "file formats". I found "file format" to be a very powerful analogy for lexicons so I kind of built everything else in the explanation around that.
You can read https://atproto.com/specs/repository for more technical details about the repository data structure:
The repository data structure is content-addressed (a Merkle-tree), and every mutation of repository contents (eg, addition, removal, and updates to records) results in a new commit data hash value (CID). Commits are cryptographically signed, with rotatable signing keys, which allows recursive validation of content as a whole or in part. Repositories and their contents are canonically stored in binary DAG-CBOR format, as a graph of data objects referencing each other by content hash (CID Links). Large binary blobs are not stored directly in repositories, though they are referenced by hash (CID).
Re: apps, I'd say AT is actually post-app to some extent because Lexicons aren't 1:1 to apps. You can share Lexicons between apps and I totally can see a future where the boundaries are blurring and it's something closer to what you're describing.
The effect of the internet (everything open to everyone) was to create smaller pockets around a specific idea or culture. Just like you have group chats with different people, thats what IG and Snap are. Segmentation all the way down.
I am so happy that my IG posts arent available on my HN or that my IG posts arent being easily cross posted to a service I dont want to use like truth social. If you want it to be open, just post it to the web.
I think I don't really understand the benefit of data portability in the situation. It feels like in crypto when people said I want to use my Pokemon in game item in Counterstrike (or any game) like, how and why would that even be valuable without the context? Same with a Snap post on HN or a HN post on some yet-to-be-created service.
ATProto apps don't automatically work like this and don't support all types of "files" by default. The app's creator has to built support for a specific "file type". My app https://anisota.net supports both Bluesky "files" and Leaflet "files", so my users can see Bluesky posts, Leaflet posts, and Anisota posts. But this is because I've designed it that way.
Anyone can make a frontend that displays the contents of users PDSs.
Here's an example...
Bluesky Post on Bluesky: https://bsky.app/profile/dame.is/post/3m36cqrwfsm24
Bluesky Post on Anisota: https://anisota.net/profile/dame.is/post/3m36cqrwfsm24)
Leaflet post on Leaflet: https://dame.leaflet.pub/3m36ccn5kis2x
Leaflet post on Anisota: https://anisota.net/profile/dame.is/document/3m36ccn5kis2x
I also have a little side project called Aturi that helps provide "universal links" so that you can open ATProto-based content on the client/frontend of your choice: https://aturi.to/anisota.net
Twitter was my home on the web for almost 15 years when it got taken over by a ... - well you know the story. At the time I wished I could have taken my identity, my posts, my likes, and my entire social graph over to a compatible app that was run by decent people. Instead, I had to start completely new. But with ATProto, you can do exactly that - someone else can just fork the entire app, and you can keep your identity, your posts, your likes, your social graph. It all just transfers over, as long as the other app is using the same ATProto lexicon (so it's basically the same kind of app).
I have all of the raw image files that I've uploaded to Instagram. I can screenshot or download the versions that I created in their editor. Likewise for any text I've published anywhere. I prefer this arrangement, where I have the raw data in my personal filesystem and I (to an extent) choose which projections of it are published where on the internet. An IG follow or HN upvote has zero value to me outside of that platform. I don't feel like I want this stuff aggregated in weird ways that I don't know about.
I actually agree with that. See from the post:
>For some use cases, like cross-site syndication, a standard-ish jointly governed lexicon makes sense. For other cases, you really want the app to be in charge. It’s actually good that different products can disagree about what a post is! Different products, different vibes. We’d want to support that, not to fight it.
AT doesn't make posts from one app appear in all apps by default, or anything like that. It just makes it possible for products to interoperate where that makes sense. It is up to whoever's designing the products to decide which data from the network to show. E.g. HN would have no reason to show Instagram posts. However, if I'm making my own aggregator app, I might want to process HN stuff together with Reddit stuff. AT gives me that ability.
To give you a concrete example where this makes sense. Leaflet (https://leaflet.pub/) is a macroblogging platform, but it ingests Bluesky posts to keep track of quotes from the Leaflets on the network, and display those quotes in a Leaflet's sidebar. This didn't require Leaflet and Bluesky to collaborate, it's just naturally possible.
Another reason to support this is that it allows products to be "forked" when someone is motivated enough. Since data is on the open network, nothing is stopping from a product fork from being perfectly interoperable with the original network (meaning it both sees "original" data and can contribute to it). So the fork doesn't have to solve the "convince everyone to move" problem, it just needs to be good enough to be worth running and growing organically. This makes the space much more competitive. To give an example, Blacksky is a fork of Bluesky that takes different moderation decisions (https://bsky.app/profile/rude1.blacksky.team/post/3mcozwdhjo...) but remains interoperable with the network.
That's just how it works and I accept the risk.
People concerned about that probably shouldn't publish on Bluesky. Private chat makes more sense for a lot of things.
[0]: https://remotestorage.io/
I think of those projects as working relatively well for private data, but public data is kinda awkward. ATProto is the other way around: it has a lot of infra to make public data feasible, but private data is still pretty awkward.
It's a lot more popular though, so maybe has a bigger chance of solving those issues? Alternatively, Bluesky keeps its own extensions for that, and starts walling those bits off more and more as the VCs amp up the pressure. That said, I know very little about Bluesky, so this speculation might all be nonsense.
remoteStorage seems aimed at apps that don't aggregate data across users.
AT aims to solve aggregation, which is when many users own their own data, but what you want to display is something computed from many of them. Like social media or even HN itself.
The whole things seems a bit over engineered with poor separation of concerns.
It feels like it'd be smarter to flatten the design and embed everything in the Records. And then other layers can be built on top of that
Making every record includes the author's public-key (or signature?). Anything you need to point at you'd either just give its hash, or hash + author-public-key. This way you completely eliminate this goofy filesystem hierarchy. Everything else is embed it in the Record.
Lexicons/Collections are just a field in the Record. Reverse looking up the hash to find what it is, also a separate problem.
- Getting the @danabramov username would be done by having some lookup service that does person-key->username. You could have several. Usernames can be changed with the service.. But you can have your own map if you want, or infer it from github commits :)) There are some interesting ideas about usernames about. How this is done isn't specified by the Record
- Lexicon is also done separately. This is some validation step that's either done by a consumer app/editor of the record or by a server which distributes records (could be based on the :type or something else). Such a server can check if you have less than 300 graphemes and reject the record if it fails. How this is done isn't specified by the Record
- Collection.. This I think is just organizational? How this is done isn't specified by the Record. It's just aggregating all records of the same type from the same author I guess?
- Hashes.. they can point at anything. You can point at a webpage or an image or another record (where you can indicate the author). For dynamic content you'd need to point at webpage that points at a static URL which has the dynamic content. You'd also need to have a hash->content mapping. How this is done isn't specified by the Record
This kind of setup makes the Record completely decoupled from rest of the "stack". It becomes much more of independent moveable "file" (in the original sense that you have at the top) than the interconnected setup you end up with at the end.
- How do you rotate keys? In AT, the user updates the identity document. That doesn't break their old identity or links.
- When you have a link, how do you find its content? In AT, the URL has identity, which resolves to hosting, which you can ask for stuff.
- When aggregating, how do you find all records an application can understand? E.g. how would Bluesky keep track of "Bluesky posts". Does it validate every record just in case? Is there some convention or grouping?
Btw, you might enjoy https://nostr.com/, it seems closer to what you're describing!
[1]: https://www.inkandswitch.com/essay/local-first/
[2]: https://humanewebmanifesto.com/
https://solidproject.org/
- AT tries to solve aggregation of public data first. I.e. it has to be able to express modern social media. Bluesky is a proof that it would work in production. AFAIK, Solid doesn't try to solve aggregation, and is focused on private data first. (AT plans private data support but not now.)
- AT embraces "apps describe on their own formats" (Lexicons). Solid uses RDF which is a very different model. My impression is RDF may be more powerful but is a lot more abstract. Lexicon is more or less like *.d.ts for JSON.
But one good thing did come out of that effort. Dmitri Zagidulin, the chief architect on the team, worked hard at the W3C to get departments together to create the DID standard (decentralized IDs) which were then used in everything from Sidetree Protocol (thanks Dan Buchner for spearheading that) to Jack Dorsey’s “Web5”.
Having said all this… what protocol is better for social? Feeds. Who owns the feeds? Well that depends on what politics you want. Think dat / hypercore / holepunch (same thing). SLEEP protocol is used in that ecosystem to sync feeds. Or remember scuttlebutt? Stuff like that.
Multi-writer feeds were hard to do and abandoned in hypercore but you can layer them on top of single-writer. That’s where you get info join ownership and consensus.
ps: Dan, if you read this, visit my profile and reach out. I would love to have a discussion, either privately or publicly, about these protocols. I am a huge believer in decentralized social networking and build systems that reach millions of community leaders in over 100 countries. Most people don’t know who I am and I’m happy w that. Occasionally I have people on my channel to discuss distributed social networking and its implications. Here are a few:
Ian Clarke, founder of Freenet, probably the first decentralized (not just federated) social network: https://www.youtube.com/watch?v=JWrRqUkJpMQ
Noam Chomsky, about Free Speech and Capitalism (met him same day I met TimBL at MIT) https://www.youtube.com/watch?v=gv5mI6ClPGc
Patri Friedman, grandson of Milton Friedman on freedom of speech and online networks https://www.youtube.com/watch?v=Lgil1M9tAXU
Adding your own emergency rotational key in case your hosting goes rogue is supported, but is a separate thing and not required for normal usage. I'd like this to be more ergonomical though.
That said it’s a very elegant way to describe AT protocol.
Why can’t we have nice things?
I guess that’s what Internet Archive is for.
if they were decent people, that would be one thing. but if they're going to be poisoned with the same faux-libertarian horseshit that strangled twitter, I don't see any value in supporting their protocol. there's always another protocol.
but assuming I was willing to play ball and support this protocol, they STILL haven't solved the actual problem that no one else is solving either: your data exists somewhere else. until there's a server that I can bring home and plug in with setup I can do using my TV's remote, you're not going to be able to move most people to "private" data storage. you're just going to change which massive organization is exploiting them.
I know, I know: hardware is a bitch and the type of device I'm even pitching seems like a costly boondoggle. but that's the business, and if you're not addressing it, you're not fomenting real change; you're patting yourself on the back for pretending we can algorithm ourselves out of late-stage capitalism.
[0] *potentially/eventually
That's not correct, actually hosting user data is cheap. Most users' repos are tiny. Bluesky doesn't save anything by having someone move to their own PDS.
What's expensive is stuff like video processing and large scale aggregation. Which has to be done regardless of where the user is hosting their data.
Quite some BSky users are publishing on their own PDS (Personal Data Server) right now. They have been for a while. There are already projects that automate moving or backign up your PDS data from BSky, like https://pdsmoover.com/
https://www.joelonsoftware.com/2001/04/21/dont-let-architect...
I agree something like "hyperlinked JSON" maybe sounds too abstract, but so does "hyperlinked HTML". But I doubt you see web as being vague? This is basically web for data.
https://news.ycombinator.com/newsguidelines.html