The spend at my organization has reached beyond the $200,000 per month level on Anthropic's enterprise tier.
The amount of outages we have had over these past few months are astounding and coupled with their horrendous support it has our executive team furious.
its alot of money to be spending for a single 9 of reliablility.
Speaking of developer tooling spend - IDEs are far harder to build such as JetBrain etc and don't think any IDE would be charging this amount to any customer per month.
Not sure how much of a productivity gain a 2.5 million per year it is?
GitHub, along with MSFT in general, have massive copilot mandates where workers are being shamed into using slop tools to fix serious on-going issues. GitHub seems wholly incapable of resolving their issues: money isn't a problem, talent isn't a problem, but business leadership is definitely a major problem.
Look at how other companies are suffering massive outages due to LLMs too like AWS and Cloudflare. Two companies that use to be the best in the industry at uptime but have suddenly faltered quite quickly.
Companies that have even worse standards will quickly realize how problematic these tools are. Hopefully before a recession because this industry seems to be allergic to profitable businesses and leaders that have been around since ZIRP have shown zero intelligence in navigating these times.
None of the three major Cloudflare outages in the past six months had anything to do with LLMs. They were regular old human mistakes.
We did, however, determine that at least one of them (and perhaps all) would have been easily caught by AI code reviewers, had AI code reviewers been in use. So now we mandate that. And honestly, I love it, the AI reviewer spots all sorts of things that humans would probably miss.
(We also fixed a number of problems around configuration that would roll out globally too fast, leaving no time to notice errors and stop a bad rollout, as well as cases where services being down actually made it hard to revert the change... should be in a much better place now. But again, none of that had to do with LLMs.)
Obviously there is only so much you can say; but is that $200K due to the raw number of seats you have, or are you burning through a lot on raw API usage? I guess I'm trying to understand, large business, or large usage.
we are in the SMB space, the spend is almost entirely usage for us at this point, rather than seat cost.
For context, we are a software firm focused on difficult engineering problems, but I cant divulge much else.
Our expense is roughly around 12.3 software developers when you break it down across all people related expenses. But we've spent alot of time and energy prior to this focusing on our ability to measure our software development output across multiple teams.
The delivery improvements are not evenly applied across all teams, but the increases that we have seen suggest a better ROI than if we had hired 12 developers.
Respectfully,
After a certain level of compensation, you are indeed judged purely off of input and output.
Workplace improvement does not justify your salary.
You will also find that many problems in the harder sciences do not get easier by throwing more bodies at them.
Comments like these remind me that some project managers think they'd be able to delivery a baby in 1 month if they simply had 9 women.
I think there is alot of baseless fury behind your words, but my regular interactions with my leadership dont lead me to think they have the end goal of replacing labor.
We're blessed to have leadership with technical backgrounds, so the tools are regarded more as significant intelligence enhancers of already exceptionally smart engineers, rather than replacements.
Doesnt seem to us to be wheelbarrows of money, when you consider the average AWS/Azure bill.
Throwing bodies at a problem doesn't always scale.
There are many difficult problems that do not get easier by throwing more juniors or mid level engineers at them.
Just another Mythos breakout. Excuse us while we airgap the affected DC and send in a team to drive framing nails into every storage device in the building.
If this can happen to Anthropic, imagine all the companies building on top of Claude Code for live products. Hopefully the industry is learning that competent problem solving human engineers are still very much needed when you have increasingly deceptive non-deterministic genies running your production stack.
Truly! As someone who's worked with HPC and GPUs in a scientific research context, trying to get a service like this to work reliably is a different ballgame to your usual webapp stack...
Hug ops to everyone involved in these outages and trying to maintain uptime.
But glad my team is staying nimble and has multi-model (Anthropic, Codex, Gemini), multi-modal (desktop, CLI/TUI, web) dev tooling.
As our actual coding skills collectively atrophy, we'll either need to switch tools or go for a walk when the LLM is down.
In the cloud era I advised against a multi-cloud strategy, as the effort to impact just wasn't there. But perhaps this is different in the LLM era, where the cost of switching is pretty darn low.
We've been running our 10 dev org on 8 H100s on open models (with some tweaks). Sure they aren't as good as the big providers but they 1. don't go down 2. have pretty damn high tok/s. It pays for itself.
Posting with a fresh account because I'm not supposed to share these details for obvious reason. If you want help on setting this up, just reply with a way to reach you.
It was pretty hard to justify the purchase to the board but we got a decent deal from a nearby data-center (~15% discount). Thankfully, it's fixed cost, its an asset we can use for our taxes, and it will survive for years to come. The only thing we have to work on is maintenance as well as looking into some renewable energy options.
We're also looking into how to do some secure cost sharing with this so that all people need to pay for are what it costs for us to run everything! We're just planning on reserving at least 51% of the capacity for us and the rest for everyone else.
It's fine! There's no world where individuals can buy this kind of stuff. Our company is too small to do it, but I'd love for there to be a public utility of sorts for being able to use LLMs. It is absurd that only these >$1T companies are allowed to run this. I also find it dangerous for society to have so much power and wealth concentrated there too.
Anyway, this is the internet and skepticism is warranted :D.
Yea, I actually looked into a similar thing myself recently. I was looking at how we could replace Cursor, and I found that for ~10 people we'd need a half dozen H100's or something on that scale, which would cost ~$1500 per developer or so to build and maintain on cloud infra, and to buy it would cost roughly 3 developers yearly salaries or so (this aligns with your numbers). We don't use that much inference, so we decided paying Cursor ~$200-300 per dev per month is better, for now, but in the future we might regret that when prices normalize more. However, we also don't use cloud agents or independent agents, we basically use AI as a pair programmer, so if we had to drop AI coding assistants completely our process wouldn't break too badly. I wish I could task my 3080 gaming card to do some inference, but I can only get ~10B models on there, so it's kinda worthless unless it's for something a small model can do.
Glad I started using the desktop app which is still working. Gotta say though, all of these difficulties with Claude are making me nervous as I use it a lot for work and really don't like ChatGPT/OpenAI for functional and personal reasons. Zo Computer has been my main fallback when Claude is failing, I'll use one of their many models temporarily within Zo's interface.
I have been keeping an eye on the outages. This is why I am looking more deeply into what I can do with self-hosted models. When I see people who want to build products on top of these services I can't help but think that people are mad. We're still a long way from these services being anywhere near stable enough for use in a product you'd want to sell someone.
> We are continuing to work to resolve the issues preventing users from accessing Claude.ai, and causing elevated authentication errors for requests to the API and Claude Code.
What are you doing with the authentication servers? This isn't the first downtime I've seen caused by that.
You're absolutely right! AI could be very helpful in this situation!
Oh no wait... the outage is with out AI itself, so how can AI help? Allow me to re-evaluate.
Fublutenuating...
Yes, let's ask AI!
Oh no wait... the outage is with AI itself, I already correctly identified this above.
Bubbluating...
It seems you will have to rely on your engineering skills to solve this problem yourself, ie, you're cooked! I will auto-renew your subscription to ensure you can be sure you'll have access to AI to solve this problem if it ever comes back online.
Gemini seems to have a lot as well (at least through Antigravity.Google -> constant errors, not enough capacity, super slow replies until it times out, etc)
All it took for Codex to resume a stalled Claude Code session:
> I'm working with Claude Code on session aaaaaaaa-bbbb-1223-3445-abcdefabcdef which I'd like to hand-off to you, do you know how to read the session, my input and Claude's output so we can resume where I left off?
gpt-5.5, medium effort. "Resumed" session fully in under 2 minutes. Outages like today's are so common that I've now got the time to re-evaluate Codex every other day.
Have you run a system in production? There are a multitude of reasons that a system can go down. There's no indication so far from Anthropic that this was merely compute limitations.
> There are a multitude of reasons that a system can go down.
Start doing post mortems then!
At the very least, them using any off the shelf service that's shitting the bed would inform others to stay away from it - like an IAM solution, or maybe a particular DB in a specific configuration backing whatever they've written, or a given architecture for a given scale.
Right now it's completely like a black box that sometimes goes down and we don't get much information about why it's so much less stable than other options (hey, if they just came out and said "We're growing 10x faster than we anticipated and system X, Y and Z are not architected for that." that'd also be useful signal).
Or, who knows, maybe it's just bad deploys - seems like it's back for me and claude.ai UI looks a bit different hmmm.
I have no inside knowledge of Anthropic. But having done a lot of postmortems in general, one of the key dynamics that routinely comes up is "we know we keep shipping breakages, and we know these new procedures would prevent many of them, but then we wouldn't be able to deliver new stuff so quickly". Given where Anthropic is at and what they believe about the future of software development, that's a tradeoff that they may very well be intentionally not making.
Yeah, this is not just inference. First thing for me was an MCP I use went down in Claude Code, models still worked. Now "API Error: 529 Authentication service is temporarily unavailable."
The uptime with Claude is poor. I use it for workflows more or less 24/7. It is often unreliable. Fine, it is cheap. What I really dislike is the uneven quality of the service. Clearly it does NOT work as stated. Opus 4.7 sometimes give ancient code back. Just the other day it even stated that the latest version of Opus was 4.5 and 4.x something for ChatGPT.
I'm experimenting with a simple ritual: if Claude is out, I'm out.
I'll just go for a walk outside.
And I don't mean "if I can't access Claude to do my work", I mean, just in general - I'll just ping claude.ai from time to time and use Claude's breaks as a break reminder.
It's rare in history that a software product can be so unreliable without any negative business impact because it's the category leader and demand only keeps growing.
Reminds me of the early days of World of Warcraft, when servers went down frequently because Blizzard couldn't keep up with all the load. Everyone was frustrated but of course nobody stopped playing.
So there was a recent article that I read which said that claude is now trading at a trillion dollars (yes with a T) evaluation in private markets.
We are definitely creating corporations and people which depend on AI companies themselves and the reliability of these tools is certainly a question worth asking. I am seeing quite many downtimes in products like github and claude being shown on Hackernews multiple times.
Is there a life cycle of enshittenification of such products which grow too valuable? What are (are there?) some practical lessons for such scalability that these trillion dollar companies are missing or is it just a dose of reality that such massive corporations can't compete with downtime with even my 7$/yr vps?
My question is, Is this an engineering roadblock with its limits in reality for or a management/entreprise roadblock for low downtime?
They can't fix it because the thing that they need to fix it is the thing that doesn't work. /s
But seriously: while I don't use Claude, this issue of perceived unreliability seems to be approaching the point of existential risk for Anthropic. Whats the theory about why they're struggling? Compute capacity? Load? Lack of focus on SRE?
Put it another way: is their downtime due to something fundamental about serving inference, or just bad engineering choices? Given their resources, it seems astonishing.
"We are investigating an issue preventing users from reaching Claude.ai, and will provide an update as soon as possible."
Who is We? I thought software engineers were going to be redundant and AI could do it all itself? (not to take anything away from Claude code + Claude both of which I love)
its alot of money to be spending for a single 9 of reliablility.
Not sure how much of a productivity gain a 2.5 million per year it is?
Look at how other companies are suffering massive outages due to LLMs too like AWS and Cloudflare. Two companies that use to be the best in the industry at uptime but have suddenly faltered quite quickly.
Companies that have even worse standards will quickly realize how problematic these tools are. Hopefully before a recession because this industry seems to be allergic to profitable businesses and leaders that have been around since ZIRP have shown zero intelligence in navigating these times.
We did, however, determine that at least one of them (and perhaps all) would have been easily caught by AI code reviewers, had AI code reviewers been in use. So now we mandate that. And honestly, I love it, the AI reviewer spots all sorts of things that humans would probably miss.
(We also fixed a number of problems around configuration that would roll out globally too fast, leaving no time to notice errors and stop a bad rollout, as well as cases where services being down actually made it hard to revert the change... should be in a much better place now. But again, none of that had to do with LLMs.)
You will also find that many problems in the harder sciences do not get easier by throwing more bodies at them. Comments like these remind me that some project managers think they'd be able to delivery a baby in 1 month if they simply had 9 women.
[but as his manager I can tell you:] YES !!!!
And yet they will continue to spend wheelbarrows full of money with Anthropic because they want so badly to reach the point where they can fire you.
Doesnt seem to us to be wheelbarrows of money, when you consider the average AWS/Azure bill.
Besides, codex wasn't always the answer.
Some of the comments here mention their monthly spend, and it’s eye watering.
hardware capacity constraints is going to be the big one
Effective caching is another, I bet if you start hitting cold caches the whole things going to degrade rapidly.
The ground is probably shifting pretty rapidly.
Power users are trying to get the most out of their subscriptions and so are hammering you as fast as they possibly can. See Ralph loops.
Harnesses are evolving pretty rapidly, as well as new alternatives harnesses. Makes the load patterns less predictable, harder to cache.
The demand is increasing both from more customers, but also from each user as they figure out more effective workflows.
Users are pretty sensitive to model quality changes. You probably want smart routing, but users want the best model all the time.
Models keep getting bigger and bigger.
On top of that they are probably hiring more onboarding more, system complexity and codebase complexity is growing.
But glad my team is staying nimble and has multi-model (Anthropic, Codex, Gemini), multi-modal (desktop, CLI/TUI, web) dev tooling.
As our actual coding skills collectively atrophy, we'll either need to switch tools or go for a walk when the LLM is down.
In the cloud era I advised against a multi-cloud strategy, as the effort to impact just wasn't there. But perhaps this is different in the LLM era, where the cost of switching is pretty darn low.
Posting with a fresh account because I'm not supposed to share these details for obvious reason. If you want help on setting this up, just reply with a way to reach you.
We're also looking into how to do some secure cost sharing with this so that all people need to pay for are what it costs for us to run everything! We're just planning on reserving at least 51% of the capacity for us and the rest for everyone else.
I actually respect this a ton, good work.
Anyway, this is the internet and skepticism is warranted :D.
They should ask Codex now that Claude Code is down.
I'm looking into how to structure my work to run some autonomous-safe jobs overnight to take advantage of it.
What are you doing with the authentication servers? This isn't the first downtime I've seen caused by that.
Good thing I checked Hacker News first
Oh no wait... the outage is with out AI itself, so how can AI help? Allow me to re-evaluate.
Fublutenuating...
Yes, let's ask AI!
Oh no wait... the outage is with AI itself, I already correctly identified this above.
Bubbluating...
It seems you will have to rely on your engineering skills to solve this problem yourself, ie, you're cooked! I will auto-renew your subscription to ensure you can be sure you'll have access to AI to solve this problem if it ever comes back online.
No!
Comboculating...
I apologize for the misunderstanding, I have deleted your project. I am sorry, would you like me to restart everything from scratch ?
Have telegram set up and plotting to take over the world
> I'm working with Claude Code on session aaaaaaaa-bbbb-1223-3445-abcdefabcdef which I'd like to hand-off to you, do you know how to read the session, my input and Claude's output so we can resume where I left off?
gpt-5.5, medium effort. "Resumed" session fully in under 2 minutes. Outages like today's are so common that I've now got the time to re-evaluate Codex every other day.
Start doing post mortems then!
At the very least, them using any off the shelf service that's shitting the bed would inform others to stay away from it - like an IAM solution, or maybe a particular DB in a specific configuration backing whatever they've written, or a given architecture for a given scale.
Right now it's completely like a black box that sometimes goes down and we don't get much information about why it's so much less stable than other options (hey, if they just came out and said "We're growing 10x faster than we anticipated and system X, Y and Z are not architected for that." that'd also be useful signal).
Or, who knows, maybe it's just bad deploys - seems like it's back for me and claude.ai UI looks a bit different hmmm.
I'll just go for a walk outside.
And I don't mean "if I can't access Claude to do my work", I mean, just in general - I'll just ping claude.ai from time to time and use Claude's breaks as a break reminder.
Why should AI get a breather and not us?
Reminds me of the early days of World of Warcraft, when servers went down frequently because Blizzard couldn't keep up with all the load. Everyone was frustrated but of course nobody stopped playing.
So there was a recent article that I read which said that claude is now trading at a trillion dollars (yes with a T) evaluation in private markets.
We are definitely creating corporations and people which depend on AI companies themselves and the reliability of these tools is certainly a question worth asking. I am seeing quite many downtimes in products like github and claude being shown on Hackernews multiple times.
Is there a life cycle of enshittenification of such products which grow too valuable? What are (are there?) some practical lessons for such scalability that these trillion dollar companies are missing or is it just a dose of reality that such massive corporations can't compete with downtime with even my 7$/yr vps?
My question is, Is this an engineering roadblock with its limits in reality for or a management/entreprise roadblock for low downtime?
But seriously: while I don't use Claude, this issue of perceived unreliability seems to be approaching the point of existential risk for Anthropic. Whats the theory about why they're struggling? Compute capacity? Load? Lack of focus on SRE?
Put it another way: is their downtime due to something fundamental about serving inference, or just bad engineering choices? Given their resources, it seems astonishing.
Luckly Qwen3.6 35B A3B Local LLM works fine also when Claude is offline
Who is We? I thought software engineers were going to be redundant and AI could do it all itself? (not to take anything away from Claude code + Claude both of which I love)
Adam Neumann is back!
in agent form