It is actually worse than that. It is at least 30 days. There is an "almost" that is doing a ton of heavy lifting here "deletion after 30 days in almost all cases". My read of that is they can hang onto data for as long as they want, even if they usually won't. And "all traffic" with an agentic harness is basically your entire codebase you work on.
> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
One of the major attack vectors is distillation, where millions of questions are auto-generated and coordinated to produce training data for new LLMs. Anthropic alleges Minimax, Deepseek and Kimi were trained this way. Deepseek 4 compares favorably to Opus, so they're probably trying to prevent Deepseek 5 from being a bootleg Mythos. https://www.anthropic.com/news/detecting-and-preventing-dist...
It's only for this model, not the one you're already using. And they're not training on the data. It's supposedly to detect abuse etc (such as someone retrying repeatedly with different variations to get around their protections)
Maybe, but to do so they'd need to offer new terms of service and we'd have to accept. I believe they'd lose a lot of their core business market if they did so.
I cannot help wondering if the 'we won't train on your data' applies across the fence over there in pentagon land, where the classified contracts be. Yeah, of course they are not connected. Or..
Present user-llm activity is a goldmine of intel the agencies literally spent lives and billions on getting hardly close to, yet they elect to just let this one slip by..
Maybe. Really, I don't dispute it.
But why? It's what, or precisely what, they always dreamed of.
Yes, that is your intended purpose of “git push”, it’s to save. And only if you use GitHub.
A better analogy here is probably “every time you use VS Code, the files you edit get sent to Microsoft”.
Some legitimate concerns:
• You have trade secrets. Previously; you can use services like Bedrock, etc, with signed contracts and significant reputations. Your contract is between AWS and you, and stays within your AWS security boundary.
• Security breaches. Remember when Anthropic accidentally published the source tree of Claude code? Or Meta’s recent AI recovery bot that didn’t check if the supplied recovery email was actually the email of the Instagram account? The best way to reduce your exposure is to minimise storage.
• Weaponised T&S. For example what if Anthropic decided to build a classifier for “usage in unsupported regions” that’s super overbearing (as we see with Fable) and vacuums up all context/input/output if there’s Mandarin? Contractually they could now retain it forever, not just 30 days, for ‘trust and safety purposes’ and perhaps have AI scan for any new or interesting ML techniques at scale, for Anthropic’s own use? They say just can’t train Claude models on the data.
A startup that uses agentic coding tools such as Claude Code or Codex is packaging up their entire codebase and sending it directly to their LM provider. Depending on their product, they might be sending it directly to a potential competitor.
people over-rate how much software/IP is useful in running a successful business. There are genuinely very few IP in this world that needs to be protected. Everyone else is running stupid CRUD apps
They also over index fear of LargeCo stealing IP from SmallCo. In fact, LargeCo is typically more scared about even the possibility of any product team looking at competitor internals due to lawsuits.
I’d be more scared of a data leak due to LargeCo being hacked than I would about LargeCo prying into the data.
What I don’t trust LargeCo with is personal information. I’ve heard too many horror stories about Govs and LargeCos swapping customer nudes or stalking ex’s to be comfortable with anything personal on those systems. But that’s a whole different topic.
actuaries look for data. visionaries take leaps in faith.
There was no data proving LLMs will work at scale.
Google waited for the Data. OpenAI and then Anthropic took the leap of faith.
The result is there for all to see.
The core attribute of a successful AI Researcher was were they AGI-pilled and not were they waiting for data for unknown unknowns?
Yes, it certainly is an odd situation when some people believe you cannot use Mythos-class models because security while others believe you must do code reviews with Mythos-class models because security.
they would kill their own product if they did this
it would be like if tsmc started designing their own chips to compete with the people they sell their services to, they have more to gain by limiting their participation to a specific corner
Fortunately I can't use Fable anyway, since their hyperactive content flaggers do not let you work on anything remotely biological or medical related (i.e. parse a CSV with some medical content, nope, you're probably a bioterrorist) and you get downgraded to Opus immediately.
I'm not even working on anything biological/medical, almost all PyTorch work is getting flagged (not even a safety notice and a downgrade, just an outright refusal with "this is against our ToS").
My 2 cents is that doctors people with lots of money and very specific needs who generally don't really go for tech jobs, so they're probably planning to create a separate monetization tier.
That, or alternatively, Mythos is so good at medical stuff, that it cam replace a lot of physician work 90% of the time, pissing off doctors, while the remaining 10% would result in very expensive lawsuits.
> That, or alternatively, Mythos is so good at medical stuff, that it cam replace a lot of physician work 90% of the time, pissing off doctors
Well they definitely don’t give a teaspoon of shit about putting people out of work by hawking munged-up versions of those people’s data, which was involuntarily ‘ingested’ for the benefit of society (in a way that happened to fuel a centabillion dollar industry.) So it’s prolly not that one.
Yes! I have hit the same brick wall. What sort of idiots are doing this? Honestly, I have no idea. And just before their IPO. SO far Anthropic marketing has been perfect and spotless. This is serious slipup.
>To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.
They don't want the real risk of someone using it to make biological or genetically targeted weapons, and they don't want the social risk of someone asking it a bunch of leading questions in order to 'prove' some racist thesis or to 'prove' Mythos is woke if it declines to along with their performative inquiry.
Let's face it, if some rando comes up to and asks if you have a few minutes to talk about population biology there's a good chance they're a kook.
Anthropic is desperate for the IPO and will release a half-baked product that they are so afraid to release, you can literally feel the shiver through the text of their press-release.
Now they want to have any way of either fixing it, or in case someone will actually make a big boo-boo with their model, to be able to blame the guy in the end.
« Trust us, we’re doing this for the good of humanity » (fills pockets with stock value and externalities from data center polloution) « No seriously trust us , at least we’re not Sam Altman »
Update: « Oh and we’re the only ones who will stop AI from turning into SkyNet and eating your babies, you just have to pay us to make sure we invent SkyNet first »
Are they really burning good will? For many users this is a deal breaker. But for the general public, politicians, etc they’re stamping “safety” on their brand.
Given the model intelligence plateau and public data exhaustion the only way to improve in customer use cases is by training the model on customer data.
If this is true, than Anthropic, Google and maybe OpenAI models will keep getting better and better and everyone else will be left in the dust - as they won't have access to so much customer data.
If they weren’t storing, they’d be oblivious to what customers are doing, making this kind of detection impossible. What data did they train their classifier on, if not real user (distiller) traffic?
Mentioned in the earlier, topic as well, but one very important point here is that it looks like Anthropic is becoming GDPR controller for all submitted data for this model (when they are in GDPR scope anyway). So data subjects would have Article 15 right to request information about processing and possibly a copy of the data. Latter might be contested under "rights of others", but former is more absolute.
What this means it that if someone makes an Article 15 request, they would be entitled to know if Anthropic holds personal data about them and also from who they received this data at minimum.
If someone wants to do that, I would recommend combining it with Article 18 request to forbid deleting the data for legal claim in case you contest Anthropic's reply. Otherwise they could just delete the data per their retention policy and DPA would find much later that they no longer hold the data.
Another issue here is that their DPA frames everything as controller-to-processor, i.e. they do not appear to have SCCs in place to actually receive this personal data as controller. So the original exporter would likely also be in breach if they send any GDPR covered personal data to this model.
This could be a big issue for firms with strict GDPR criteria:
"This change only applies to organizations that have set up workspaces with zero data retention (ZDR) in Claude Console, use Claude Code with ZDR in Claude Enterprise, or access Claude through AWS Bedrock, Google Cloud Agent Platform, or Microsoft Foundry with ZDR. The rest of this article applies only to these organizations."
Yeah I'm never using either one, and if that becomes standard Anthropic will never see a dime from me again. I'm going to draw the line in the sand right there.
I guess the better question would be if you are under and NDA and using an online model, are you already violating it but does this violate it further?
Your NDAs prohibit emailing a colleague about the e.g. project, or discussing it in a Slack DM with the client, or tracking progress on it in JIRA? You have to do NDA’d work exclusively with local tools or end-to-end encryption? Those are some difficult NDAs!
We use inhouse on-premises email, issue tracking, and messaging. Depending on the project, external communication does require E2EE email. Development happens on local hardware and software unless required otherwise by the customer.
I’m pretty sure (even just based on the revenue of various SaaS products) that’s not typical, hence “most NDAs”. I’m also sure some require a SCIF, but that’s not most of them.
I am definitely for services respecting customer privacy, but I can't help if this is different. I recently saw a thread where a person was bragging that frontier providers were blocking their attempt at what looked like to be social media de-anonymization and blackmailing app.
Maybe this isn't different than using something like Google Sheets to keep a list of people to dox and blackmail, but the leverage certainly makes it feel different.
There were two (expensive) exceptions / alternatives so far: Bedrock and Vertex. Their Zero Data Retention was in fact contractually enforced. Now it is all f...d because of these morons at Anthropic. For now I am better off just using DS via their API.
This is just a tragic moment for Tech. We just killed AI privacy. OpenAI already follows this trend and others will do too.
It’s not binary. With AWS previously you have contractual guarantees with a third party, that’s been in business for a couple decades, which explicitly state zero seconds of data retention - only as long as needed for inference.
Consider the security angle too. You now have to rely on Anthropic’s infrastructure security. You did not previously when you used Bedrock/Vertex/etc.
I'm talking about scouring Twitter/LinkedIn and look at posts from employees who say SOTA model is banned. Look at what the business do. Copy it using SOTA. Call their clients with 30% discount and faster turnaround and higher quality product.
It is complicated, but I can get Private Equity of even VCs to fund this idea.
tl;dr -- I'm actually agreeing with you. Anthropic will never copy your business model due to NDA. But there are plenty of fearmongering about they copying you and because of which you won't use their models. If their models are genuinely SOTA you can use that information to your advantage and crush scaredy-cats.
Edit: The fact that these get downvoted is exactly the reason why it's easy to win
I mean not just the part 30 days data retention but I think the serious trade of this product is just the token efficiency. They trade it for precision. The claims that they make that it found a 30 year software bug from millions of lines of code is just precision. To human it's looks like a lot but for it it's just the ablity to process (token processing). Let's see how long it runs. Peace.
All I can say to my team (and my clients): "f...k Anthropic". They've just put both Bedrock and Vertex on slippery slope of "we don't collect your prompts. period. ... comma ... except ..."
Right now we have changed the code of all our agents to data retention mode 'none' (Note: not "default" or "inherited", this is not enough now!) and we are fighting with GCP doco to set similar things for Vertex.
All he pre-publicity from Anthropic was about how it was amazing at finding security vulnerabilities, so it's not a stretch to think that some people would want to exploit that for nefarious purposes.
> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
These terms seem to be updated at-will, so I'll take that with a grain of salt however.
When they literally just showed you they are being deceptive by sneaking in the weasel word “almost”?
Present user-llm activity is a goldmine of intel the agencies literally spent lives and billions on getting hardly close to, yet they elect to just let this one slip by..
Maybe. Really, I don't dispute it.
But why? It's what, or precisely what, they always dreamed of.
A better analogy here is probably “every time you use VS Code, the files you edit get sent to Microsoft”.
Some legitimate concerns:
• You have trade secrets. Previously; you can use services like Bedrock, etc, with signed contracts and significant reputations. Your contract is between AWS and you, and stays within your AWS security boundary.
• Security breaches. Remember when Anthropic accidentally published the source tree of Claude code? Or Meta’s recent AI recovery bot that didn’t check if the supplied recovery email was actually the email of the Instagram account? The best way to reduce your exposure is to minimise storage.
• Weaponised T&S. For example what if Anthropic decided to build a classifier for “usage in unsupported regions” that’s super overbearing (as we see with Fable) and vacuums up all context/input/output if there’s Mandarin? Contractually they could now retain it forever, not just 30 days, for ‘trust and safety purposes’ and perhaps have AI scan for any new or interesting ML techniques at scale, for Anthropic’s own use? They say just can’t train Claude models on the data.
Odd times we are living in!
They also over index fear of LargeCo stealing IP from SmallCo. In fact, LargeCo is typically more scared about even the possibility of any product team looking at competitor internals due to lawsuits.
What I don’t trust LargeCo with is personal information. I’ve heard too many horror stories about Govs and LargeCos swapping customer nudes or stalking ex’s to be comfortable with anything personal on those systems. But that’s a whole different topic.
I bet if you gave them the Codebase of the Gods, it’d be a heap of hacks inside a couple months.
Literally how LLMs will continue to learn to code and easily replace whatever you build with them.
Incredible that you could so blithely misunderstand this
Indeed, by a couple trillions...
Your email domain is significantly more important than whatever is in your corporate GitHub repositories.
it would be like if tsmc started designing their own chips to compete with the people they sell their services to, they have more to gain by limiting their participation to a specific corner
That, or alternatively, Mythos is so good at medical stuff, that it cam replace a lot of physician work 90% of the time, pissing off doctors, while the remaining 10% would result in very expensive lawsuits.
Well they definitely don’t give a teaspoon of shit about putting people out of work by hawking munged-up versions of those people’s data, which was involuntarily ‘ingested’ for the benefit of society (in a way that happened to fuel a centabillion dollar industry.) So it’s prolly not that one.
>To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.
Let's face it, if some rando comes up to and asks if you have a few minutes to talk about population biology there's a good chance they're a kook.
Now they want to have any way of either fixing it, or in case someone will actually make a big boo-boo with their model, to be able to blame the guy in the end.
Update: « Oh and we’re the only ones who will stop AI from turning into SkyNet and eating your babies, you just have to pay us to make sure we invent SkyNet first »
AWS Bedrock to require sharing data with Anthropic for Mythos and future models - https://news.ycombinator.com/item?id=48473166 - June 2026 (223 comments)
If they weren’t storing, they’d be oblivious to what customers are doing, making this kind of detection impossible. What data did they train their classifier on, if not real user (distiller) traffic?
What this means it that if someone makes an Article 15 request, they would be entitled to know if Anthropic holds personal data about them and also from who they received this data at minimum.
If someone wants to do that, I would recommend combining it with Article 18 request to forbid deleting the data for legal claim in case you contest Anthropic's reply. Otherwise they could just delete the data per their retention policy and DPA would find much later that they no longer hold the data.
Another issue here is that their DPA frames everything as controller-to-processor, i.e. they do not appear to have SCCs in place to actually receive this personal data as controller. So the original exporter would likely also be in breach if they send any GDPR covered personal data to this model.
I guess the better question would be if you are under and NDA and using an online model, are you already violating it but does this violate it further?
Maybe this isn't different than using something like Google Sheets to keep a list of people to dox and blackmail, but the leverage certainly makes it feel different.
This is just a tragic moment for Tech. We just killed AI privacy. OpenAI already follows this trend and others will do too.
The only hope now is ... tada .. Mistral LOL
Consider the security angle too. You now have to rely on Anthropic’s infrastructure security. You did not previously when you used Bedrock/Vertex/etc.
Step 2: Use SOTA models to copy them and crush them
Step 3: Profit.
(Yes, not every business is easily replicable, but you sure can find some)
I'm talking about scouring Twitter/LinkedIn and look at posts from employees who say SOTA model is banned. Look at what the business do. Copy it using SOTA. Call their clients with 30% discount and faster turnaround and higher quality product.
It is complicated, but I can get Private Equity of even VCs to fund this idea.
tl;dr -- I'm actually agreeing with you. Anthropic will never copy your business model due to NDA. But there are plenty of fearmongering about they copying you and because of which you won't use their models. If their models are genuinely SOTA you can use that information to your advantage and crush scaredy-cats.
Edit: The fact that these get downvoted is exactly the reason why it's easy to win
Right now we have changed the code of all our agents to data retention mode 'none' (Note: not "default" or "inherited", this is not enough now!) and we are fighting with GCP doco to set similar things for Vertex.
This is just terrible.
Would you elaborate? Not sure what you're describing