AWS Trainium3 Deep Dive – A Potential Challenger Approaching

klysm 11 hours ago

This won't materialize into a legitimate threat on the NVIDIA/TPU landscape without enormous software investment. That's why NVIDIA won in the first place. This requires executives to see past the hardware and make riskier investments and we will see if this actually materializes under AWS management or not.

mrlongroots 9 hours ago

Hyperscalers do not need to achieve parity with Nvidia. There's a (let's say) 50% headroom in terms of profit margins, and plenty of headroom in terms of the complexity custom chip efforts need to implement: they don't need the complexity or generality of Nvidia's chips. If a simple architecture allows them to do inference at 50% of the TCO and 1/5th the complexity and reduce their Nvidia bill by 70% that's a solid win. I'm being fast and loose with numbers and Trainium clearly seems to have ambitions beyond inference, but given the hundreds of billions each cloud vendor is investing in the AI buildout, a couple billion on IP that you will own afterwards is a no brainer. Nvidia has good products and a solid head start but they're not unassailable or anything.
bri3d 9 hours ago

IMO the value of COTS software stack compatibility is becoming overstated: academics, small research groups, hobbyists, and some enterprises will rely on commodity software stacks working well out of the box, but large pure/"frontier"-AI inference-and-training companies are already hand optimizing things anyway and a lot of less dedicated enterprise customers are happy to use provided engines (like Bedrock) and operate at only the higher level.
I do think AWS need to improve their software to capture more downmarket traction, but my understanding is that even Trainium2 with virtually no public support was financially successful for Anthropic as well as for scaling AWS Bedrock workloads.
Ease of optimization at the architecture level is what matters at the bleeding edge; a pure-AI organization will have teams of optimization and compiler engineers who will be mining for tricks to optimize the hardware.
epolanski 2 hours ago

I feel your posts miss the bigger picture: it's a marathon, not a sprint. If you get much lower TCO than by buying Nvidia hardware at their insane margins you get more output at lower cost.
Amazon has all the resources needed to write their own backends to several ML software or even drop-in API replacements.
Eventually economics win: where margins are high competition appears and in time margins get thinner and competition starts disappearing again, it's a cycle.
stogot 11 hours ago

This is addressed in the article.
> In fact, they are conducting a massive, multi-phase shift in software strategy. Phase 1 is releasing and open sourcing a new native PyTorch backend. They will also be open sourcing the compiler for their kernel language called “NKI” (Neuron Kernal Interface) and their kernel and communication libraries matmul and ML ops (analogous to NCCL, cuBLAS, cuDNN, Aten Ops). Phase 2 consists of open sourcing their XLA graph compiler and JAX software stack.
> By open sourcing most of their software stack, AWS will help broaden adoption and kick-start an open developer ecosystem. We believe the CUDA Moat isn’t constructed by the Nvidia engineers that built the castle, but by the millions of external developers that dig the moat around that castle by contributing to the CUDA ecosystem. AWS has internalized this and is pursuing the exact same strategy.
- coredog64 11 hours ago
  
  I wish AWS all the best, but I will say that their developer-facing software doesn't have the best track record. Munger-esque "incentive defines the outcome" and all that, but I don't think they're well positioned to collect actionable insight from open GitHub repos.
- almostgotcaught 10 hours ago
  
  This isn't an "enormous software investment", this is table stakes which lose out heads up against Nvidia. See AMD.
willahmad 7 hours ago

Don't underestimate AWS.
AWS can make it seamless, so you can run open source models on their hardware.
See their ARM based instances, you rarely notice you are running on ARM, when using Lambda, k8s, fargate and others
trueismywork 8 hours ago

And data hosting rules
ivape 10 hours ago

In terms of their seriousness, word on the street is they are moving from custom chips they could be getting from Marvell over to some company I've never heard of it. So, they are making decisions that appear serious in this direction:
With Alchip, Amazon is working on "more economical design, foundry and backend support" for its upcoming chip programs, according to Acree.
https://www.morningstar.com/news/marketwatch/20251208112/mar...

thecopy 10 hours ago

I have seen links to semianalysis before, i just am scared of the length of this content. Is anyone reading these start to finish? Why?

esafak 10 hours ago

I think they're for investors.
- epolanski 2 hours ago
  
  They are.
hobo_mark 8 hours ago

I don't read them, but I listen to them on my commute (with a saas I made).
- ijidak 5 hours ago
  
  What is the saas? I've been looking for something like this.
mlmonkey 10 hours ago

Ask Gemini to summarize it? Or maybe NotebookLM to turn it into a 10-minute podcast? :-)
mNovak 7 hours ago

I do, just for fun. It's become sort of a hobby, learning more depth/detail behind the current AI arms race. It certainly cuts through the shallow takes that get thrown around constantly.

artur44 11 hours ago

The hardware story is interesting, but I’m curious how much of the real-world adoption will depend on the maturity of the compiler stack. Trainium2 already showed that good silicon isn’t enough if the software layer lags behind.

If AWS really delivers on open-sourcing more of the toolchain, that could be a much bigger signal for adoption than raw specs alone.

t1234s 6 hours ago

What does this mean for a company like Coreweave?

Analemma_ 6 hours ago

CoreWeave already had to issue more convertible debt earlier this week after a big dip in their share price. It seems like the market suspects the end is near.

jauntywundrkind 11 hours ago

> they will go with three different scale-up switch solutions over the lifecycle of Trainium3, starting with a 160 lane, 20 port PCIe switch for fast time to market due to the limited availability today of high lane & port count PCIe switches, later switching to 320 Lane PCIe switches and ultimately a larger UALink to pivot towards best performance.

It doesn't have a lot of ports and certainly not enough NTB to be useful as a switch, but man, wild to me than an AMD Epyc core has 128 lanes of PCIe and that switch chips are struggling to match even a basic server's worth of net bandwidth.

cmiles8 9 hours ago

Chips without an ecosystem and software (CUDA) does not a serious challenger make. Thats where Amazon has, and continues to, struggle.