I recently learned that the IBM Z series mainframes are generally compatible with software written for the legendary IBM 360 launched in 1964. While I'm sure there are caveats, maintaining any backward compatibility with a platform from over 60 years ago is impressive.
Having started in 8-bit microcomputers and progressing to various desktop platforms and servers, mainframes were esoteric hulking beasts that were fascinating but remained mysterious to me. In recent years I've started expanding my appreciation of classic mainframes and minis through reading blogs and retro history. This IEEE retrospective on the creation of the IBM 360 was eye-opening. https://spectrum.ieee.org/building-the-system360-mainframe-n...
Having read pretty deeply on the evolution of early computers from the ENIAC era through Whirlwind, CDC, early Cray and DEC, I was familiar with the broad strokes but I never fully appreciated how much the IBM 360 was a major step change in both software and hardware architecture. It's also a dramatic story because it's rare for a decades-old company as successful and massive as IBM to take such a huge "bet the company" risk. The sheer size and breadth of the 360 effort as well as its long-term success profoundly impacted the future of computing. It's interesting seeing how architectural concepts from the 360 (as well as DEC's popular PDP-8, 10 and 11) went on to influence the design of early CPUs and microcomputers. The engineers and hobbyists creating early micros had learned computers in the late 60s and early 70s mostly on the 360s and PDPs which were ubiquitous in universities.
After reading the IEEE article I linked above, I got the book the article was based on ("IBM: The Rise and Fall and Reinvention of a Global Icon"). While it's a thorough recounting of IBM's storied history, it wasn't what I was looking for. The author specifically says his focus was not the technical details as he felt too much had been written from that perspective. Instead that book was a more traditional historian's analysis which I found kind of dry.
I don't mean this in a condescending way at all, but really out of sheer curiosity: Who uses mainframes nowadays and what for?
I find mainframes fascinating, but I'm so unfamiliar with them that I don't know what or why I'd ever use one for (as opposed to "traditional" hardware or cloud services).
Besides all answers given already, one of the reasons Unisys keeps selling Burroughs, aka ClearPath MCP, is its security model.
ESPOL/NEWP is one of the very first systems programming languages, being safe by default, with unsafe code blocks.
The whole OS was designed with security first in mind, think Rust in 1961, thus their customers are companies that take this very seriously, not only running COBOL.
At least partially - the technical introduction for the Z17 says that several items can be concurrently maintained (IBM-speak for hot-swapped). So far as major items like the processing units - maybe (still reading).
> Four Power Supply Units (PSUs) that provide power to the CPC drawer and are accessible from the rear. Loss of one PSU leaves enough power to satisfy the power requirements of the entire drawer. The PSUs can be concurrently maintained
I would think their customers would demand zero downtime. And hey - if BeOS could power down CPUs almost 30 years ago I would expect a modern mainframe to be able to do it.
The Unisys Clearpath/MCP runs on Xeons, so I don't think there is CPU hot-swapping.
I don't know about physically removing a drawer, but on IBM Z, if there is a un-recoverable processor error, it will be shut down, and another spare processor brought on-line to take over, transparently.
I don't know how licensing/costs ties into the CPU/RAM spares.
Large institutions (corporations, governments) that have existed more than a couple decades, and have large-scale mission-critical batch processes that run on them, where the core function is relatively consistent over time. Very few, if any, new processes are automated on mainframes most of these places, and even new requirements for the processes that depend on the mainframe may be built in other systems that process data before or after the mainframe workflows, but the cost and risk of replacing the well-known, finely-tuned-by-years of ironing out misbehavior, battle-tested systems often isn't warranted without some large scale requirements change that invalidates the basic premises of the system. So, they stay around.
Thanks for that and yeah that fits with what I've found, mostly continuation of legacy, critical systems that were built on mainframes. It just seems shocking to me the amount of investments IBM still puts on developing those machines given that no one seems to want to use them anymore?
It feels like I must be missing something, or maybe just underestimating how much money is involved in this legacy business.
IBM mainframes are extremely profitable. There are ~1,000 customers who cannot migrate off mainframes and are willing to spend almost any amount to keep their apps working. Mainframe profits subsidize all of IBM's other money-losing divisions like cloud.
given that no one seems to want to use them anymore
According to a 2024 Forrester Research report, mainframe use is as large as it's ever been, and expected to continue to grow.
Reasons include (not from the report) hardware reliability and scalability, and the ability to churn through crypto-style math in a fraction of the time/cost of cloud computing.
Report is paywalled, but someone on HN might have a free link.
"Crypto-style" here, I am guessing--but not entirely certain--from a downstream comment, is intended as cryptographic in the more general sense, and not "cryptocurrency" as "crypto"-alone is often used for these days?
They have a lot of cryptographic functionality built directly into hardware, and IBM is touting quantum resistant cryptography as one of the key features. You won't mine Bitcoin on one, but, if you are concerned a bad actor could get a memory or disk dump of your system and store it until quantum computers become practical, IBM says they have your back.
All these legacy answers don't really make sense for this Z17... it's a new mainframe supporting up to 64T of memory and specialized cores for matrix math/AI applications. I have a hard time believing that legacy workloads are calling for this kind of hardware.
It also has a ton of high availability features and modularity that _does_ fit with legacy workloads and mainframe expectations, so I'm a little unclear who wants to pay for both sets of features in the same package.
You won't see mainframes doing AI training, but there is a lot of value in being able to do extremely low-latency inference (which is why they have their NPU on the same chip as the CPUs, just a few clock-cycles from the cores) during on-line transaction processing, and less timing-critical inference work on the dedicated cards (which are a few more nanoseconds away).
Additionally, IBM marketing likes the implication that mainframe CPUs are 'the best'. If you can buy a phone with an AI processor, it only makes sense that your mainframe must have one too. (And IBM will probably charge you to use it.)
If I were a bank, I'd order one of those and put all the financial modelling and prediction load on it. Like real time analysis to approve/deny loans, do projections for deciding what to do in slower moving financial products, predicting some future looking scenarios on wider markets, etc. etc. simultaneously.
That thing is dreadnought matmul machine with some serious uptime, and can crunch numbers without slowing down or losing availability.
or, possibly, you can implement a massively parallel version of WOPR/Joshua on it and let it rip scenarios for you. Just don't connect to live systems (not entirely joking, though).
P.S.: I'd name that version of the Joshua as JS2/WARP.
> I don't mean this in a condescending way at all, but really out of sheer curiosity: Who uses mainframes nowadays and what for?
There's probably some minor strategic relevance here. E.g. for the government which has some workloads (research labs, etc.) that suit these systems, it's probably a decent idea not to try and migrate over to differently-shaped compute just to keep this CPU IP and its dev teams alive at IBM, to make sure that the US stays capable of whipping up high-performance CPU core designs even if Intel/AMD/Apple falter.
Those customers don't use mainframe, they use POWER. There's been a handful of POWER supercomputers in the past decade built for essentially that reason.
POWER is not uncommon in HPC, but IBMi (which is very enterprisey) is also based on POWER. You won't find IBM mainframes in HPC, but that's because HPC is not as sensitive for latency and reliably than online transaction processing, and, with mainframes, you are paying for that, not for TFLOPS.
I understand that a company I work with uses a few, and is migrating away from them.
It seems clear to me that prior to robust systems for orchestrating across multiple servers that you would install a mainframe to handle massive transactional workloads.
What I can never seem to wrap my head around is if there are still applications out there in typical business settings where a massive machine like this is still a technical requirement of applications/processes or if it's just because the costs of switching are monumental.
Bank payment processing is the primary example - being able to tell if a specific transaction is or not fraudulent in less than 100 milliseconds - but there are other businesses with similar requirements. Healthcare is one of them, and fraud detection is getting a lot more sophisticated with the on-chip NPUs within the same latency constraints.
Cloud is basically an infinitely scalable mainframe. You have dedicated compute optimised for specific workloads, and you string these together in a configuration that makes sense for your requirements.
If you understand the benefits of cloud over generic x86 compute, then you understand mainframes.
Except that now you need to develop the software that gives mainframes their famed reliability yourself. The applications are very different: software developed for cloud always needs to know that part of the system might become unavailable and work around that. A lot of the stack, from the cluster management ensuring a failed node gets replaced and the processes running on them are spun up on another node, all the way up to your code that retries failed operations, needs to be there if you aim for highly reliable apps. With mainframes, you just pretend the computer is perfect and never fails (some exaggeration here, but not much).
Also, reliability is just one aspect - another impressive feature is their observability features. Mainframes used to be the cloud back then and you can trace resource usage with exquisite detail, because we used to bill clients by CPU cycle. Add to that the hardware reliability features built-in (for instance, IBM mainframes have memory in RAID-like arrays).
Pretty much every fortune 500 that's been around for more than 30 years. Batch processing primarily - from closing their books to order processing, differs by company. But if you ask the right person, they'll tell you where it's at.
Fortune 500? More like Fortune 50000 (ok, exaggeration). But there are so many banks in the world, and their automation can run back to the 1950s. They are only slowly moving away from mainframes, if only because a rewrite of a complex system that nobody understands is tough, and possibly very costly if it is the key to billions of euros/dollars/...
IBM prices processors differently for running COBOL and Java - if you run mostly Java code, your monthly licensing fees will be vastly different. On boot, the service elements (their BMCs - on modern machines they are x86 boxes running Linux) loads microcode in accordance to a saved configuration - some CPUs will run z/OS, some will run z/OS and COBOL apps, some will run Java, some will run z/VM, some will run Linux. This is all configured on the computer whose job is to bring up the big metal (first the power, then the cooling, and only then, the compute). Under everything on the mainframe side is the PR/SM hypervisor, which is, IIRC, what manages LPARS (logical partitions, completely isolated environments sharing the same hardware). The cheapest licensing is Linux under a custom z/VM (they aren't called z but LinuxONE), and the most expensive is the ability to run z/OS and COBOL code. Running Java under z/OS is somewhat cheaper. Last time I saw it, it was very complicated.
Everyone's predisposed that "mainframes are obsolete", but why not use a mainframe?
I mean, no one except for banks can afford one, let alone make back on opex or capex, and so we all resort to MySQL on Linux, but isn't the cost the only problem with them?
Banks smaller than the big ~5 in the US cannot afford anything when it comes to IT infrastructure.
I am not aware of a single state/regional bank that wants to have their IBM on premise anymore - at any cost. Most of these customers go through multiple layers of 3rd party indirection and pay one of ~3 gigantic banking service vendors for access to hosted core services on top of IBM.
Despite the wildly ramping complexity of working with 3rd parties, banks still universally prefer this over the idea of rewriting the core software for commodity systems.
A Rockhopper 4 Express, a z16 without z/OS support (running Linux) was in the mid 6 digits. It's small enough to co-locate on a rack with storage nodes. While z/OS will want IBM storage, Linux is much less picky.
IBM won't release the numbers, but I am sure it can host more workloads in the same chassis than the average 6-digit Dell or Lenovo.
Density is also bad. You spend 4 full racks and get 208 cores. Sure, they might be the fastest cores around, but that gets you only so far when even off-the-shelf Dell server has 2x128-192 cores in 1U server. Similarly 64 TB of RAM is a lot, but that same Dell can have 3 TB of RAM. If I'm reading the specs correctly (they are bit confusing), z17 has only 25G networking (48 ports); the Dell I'm checking can have 8x200G network ports and can also do 400G networking. So the single 1U commodity server has more network bandwidth than the entire 4 rack z system.
Sure, there will be lot of overhead in having tens-hundreds of servers vs single system, but for lots of workloads it is manageable and certainly worth the tradeoff.
Not the only problem with them. Not as easy to find skilled staff to operate them. Also, you become completely dependent on IBM (not fully terrible -- it's a single throat to choke when things go wrong).
By that, do you mean banks, payment networks or both? And I guess I'd be curious as to why mainframes versus the rest. It seems like the answer for "why" is mainly because it started on mainframes and the cost of switching is really high, but I wonder if there isn't more to it.
Edit: Oh yeah, just saw MasterCard has some job posting for IBM Mainframe/COBOL positions. Fascinating.
Not just that. Most operating systems lie about when an IO transaction completes for performance reasons. So if you lose power or the IO device dies you still think it succeeded. A mainframe doesn't do that... it also multiplexes the IO so it happens more than once so if one adapter fails it keeps going. The resiliency is the main use case in many situations. That said IME 99.995% of use cases don't need a mainframe. They just don't need to be that reliable if they can fail gracefully.
I think what you are referring to is the "sub capacity" pricing model wherein a rolling average of resource consumption is used for billing. They've transitioned to newer models circa cloud technology, but it's mostly the same idea with more moving parts.
The advantage of this model from a business operations standpoint is that you don't have to think about a single piece of hardware related to the mainframe. IBM will come out automagically and fix issues for you when the mainframe phones home about a problem. A majority of the system architecture is designed on purpose to enable seamless servicing of the machine without impacting operations.
> IBM will come out automagically and fix issues for you when the mainframe phones home about a problem. A majority of the system architecture is designed on purpose to enable seamless servicing of the machine without impacting
I'd rather have a fault-tolerant distributed software system running on commodity hardware, that way there's a plurality of hardware and support vendors to choose from. No lock-in.
> I'd rather have a fault-tolerant distributed software system running on commodity hardware, that way there's a plurality of hardware and support vendors to choose from. No lock-in.
But then you'd have to develop it yourself. IBM has been doing just that for 60 years (on the 360 and its descendants).
That's like toy drone company trying to compete with DARPA. Not even close.
These kinds of monsters run under critical environments such as airports, with AS400 or similar terminals being used by secretaries. These kind of workloads, reliability, security, testing, are no joke. At all. This is not your general purpose Unix machine.
I haven't worked with mainframes since the z10, but back then you could get into an entry model for about $100k.
Though the sky is the limit. The typical machine I would order had a list price of about 1 million. Of course no one pays list. Discounts can be pretty substantial depending on how much business you do with IBM or how bad they want to get your business.
The big problem is that everything in IBM-z world is negotiated, and often covered by NDAs. The pricing is complicated by which operating systems and what sort of workloads you'll be running, and what service level guarantees you need. The only published pricing in the entire life of the IBM 360/370/390/z-series line was the Linux One when it was first released... Hardware plus OS, excluding storage, was $70k on the low end.
Previous generation machines that came off-lease used to be listed on IBM's web site. You could have a fully-maxed-out previous-generation machine for under $250k. Fifteen years ago I was able to get ballpark pricing for a fully-maxed-out new machine, and it was "over a million, but less than two million, and closer to the low end". That being said, the machines are often leased.
If you go with z/vm or z/vse, the OS and softare is typically sold under terms that are pretty much like normal software, except it varies depending on the capacity level of the machine, which may be less than the physical number of CPUs in the machine, since that is a thing in IBM-land.
If you go for z/os, welcome to the world of metered billing. You're looking at tens of thousands of dollars in MRC just to get started, and if you're running the exact wrong mix of everything, you'll be spending millions just on software each month. There's a whole industry that revolves around managing these expenses. Still less complicated than The Cloud.
Hercules is _not_ used by IBM's own developers. Being found with Hercules on your computer at IBM gets you in trouble. I know people who work on mainframe-related stuff inside IBM and they steer well clear of Hercules. And I've heard that IBM's computer monitoring stuff (antivirus, asset protection, etc.) looks for Hercules and flags it.
But IBM _does_ have their own mainframe emulator, zPDT (z Personal Development Tool), sold to their customers for dev and testing (under the name zD&T -- z Development and Test), and to ISVs under their ISV program. That's what IBM's own developers would be using if they're doing stuff under emulation instead of LPARs on real hardware.
(And IBM's emulator is significantly faster than Hercules, FWIW, but overall less feature-full and lacks all of the support Hercules has for older architectures, more device types, etc.)
There was some of a legal fight between IBM and Turbo Hercules SSA, a company that tried to force IBM to license z/OS to their users. IBM has been holding a grudge ever since (probably at the advice of their legal).
You can run the emulator but you will not get your hands on
new versions of the operating system to run on it.
But there are old versions that you can get your hand on.
You don't buy a mainframe, it's consumption based pricing. They aren't just going to list a price, because they need to size the hardware to what they think the workload will be.
Could they just list prices? Sure. Will they ever do it? No.
It depends on how full those drawers are. $250k to $1m would be the typical price range.
It's easier and harder at the same time to buy older hardware. That's half the challenge though because the software is strictly licensed and you pay per MIPS.
Here's a kid who bought a mainframe and then brought it up:
It's probably impossible to say because of the service contracts that come with it. Nobody would buy one brand new and not pay for support and consulting too.
I'm completely fascinated by the diagram. In a four rack system, 2.5 rack is dedicated to I/O, half a rack is just empty and the remaining is the actual processing and memory.
The I/O probably isn't endless networking adaptors, so what is it?
“The IBM z17 supports a PCIe I/O infrastructure. PCIe features are installed in PCIe+ I/O drawers. Up to 12 I/O drawers per IBM z17 can be ordered, which allows for up to 192 PCIe I/O and special purpose features.
For a four CPC drawer system, up to 48 PCIe+ fan-out slots can be populated with fan-out cards for data communications between the CPC drawers and the I/O infrastructure, and for coupling. The multiple channel subsystem (CSS) architecture allows up to six CSSs, each with 256 channels.
The IBM z17 implements PCIe Generation 5 (PCIe+ Gen5), which is used to connect the PCIe Generation 4 (PCIe+ Gen4) dual port fan-out features in the CPC drawers. The I/O infrastructure is designed to reduce processor usage and I/O latency, and provide increased throughput and availability.”
There's also the problem in that they need to take into account floor loading. They're not going to tell a customer upgrading from an older machine to a new one that, "oh, by the way, the rack weighs twice what it used to, so you'll need to upgrade your floor while you're at it." Especially important for raised floors.
Probably channels. In an IBM mainframe, each I/O device is connected on its own channel, which is actually a separate computer that handles the transfer to/from the main CPU. This has been the case going back to the System/360, which is why mainframes are legendary for their transaction throughput and reliability. There's probably a lot of redundancy in the I/O hardware, as they have to be rock solid and hot swappable while the system is running.
Reading about mainframes feels very much like reading science fiction. Truly awesome technology that exists on a completely different plane of computing than anything else.
Yeah a few years ago there was a Talos (I think) desktop motherboard that had a POWER 8 cpu in it. It was expensive due to low production runs but I wish it had taken off. I think IBM is up to power 9 now, but I doubt if there are any personal motherboards for it.
> POWER10, however, contained closed firmware for the off-chip OMI DRAM bridge and on-chip PPE I/O processor, which meant that the principled team at Raptor resolutely said no to building POWER10 workstations, even though they wanted to.
I bought mine - I own two Talos systems and a Blackbird - because I want choices in ISA, and if people want that, they'll need to spend the money. (It helps that I'm a notorious pro-PowerPC bigot, having used POWER since the RS/6000 MicroChannel days.) Other than ARM, they're the only architecture with both general OS support and processing power in the same ballpark, and while IBM may sometimes be dunderheaded, they aren't going anywhere, either.
They aren't cheap and they aren't for everyone. But it meets my needs and it puts my money where my mouth is.
If I understand you correctly it means the primary use case is security applications that need transparency at all levels on a system that is roughly equivalent to mainstream platform performance features. Is that accurate?
As I see it, these systems have two potential markets (some natural overlap exists): the first being people like me who don't want to necessarily feed the x86-64 beast if they can avoid it, and the second indeed being people who want a completely auditable, transparent platform. In both cases to be a practical alternative the computer needs to have similar performance to what you'd get from a modern desktop, and while POWER9 is no longer cutting edge, it still generally delivers.
If I could afford one I would, not because of security, but just the geek in me finds it cool.
Back in the 90s and early 2000s, there were several non-x86 architectures that were more powerful, and even 64 bit long before Intel ever did. The DEC alpha, SPARC, and others. I was also too poor to afford those back then but I remember them fondly.
I believe Intel and AMD motherboards all have proprietary firmware and the Talos systems are puritanically open. (But the processor itself is closed so there's that. Could have gone with SPARC Niagara which was open sourced https://www.cnet.com/tech/computing/sun-open-sources-second-... )
The microarch is closed and IBM-specific. However, the ISA is open and royalty-free, and the on-chip firmware is open source and you can build it yourself. In this sense it's at least as open as, say, many RISC-V implementations.
No. The microarchitectures have some notable similarities and cross-pollinate each other, but they are distinct.
You may be thinking of IBM i (formerly known as AS/400 and i5), which has a completely abstracted instruction set that on modern systems is internally recompiled to Power.
I dunno, but the z-processors and the POWER processors look a lot different even from a floor plan / die shot perspective. The former also clock much higher. Doesn't smell like microcode to me.
I recently learned that the IBM Z series mainframes are generally compatible with software written for the legendary IBM 360 launched in 1964. While I'm sure there are caveats, maintaining any backward compatibility with a platform from over 60 years ago is impressive.
Having started in 8-bit microcomputers and progressing to various desktop platforms and servers, mainframes were esoteric hulking beasts that were fascinating but remained mysterious to me. In recent years I've started expanding my appreciation of classic mainframes and minis through reading blogs and retro history. This IEEE retrospective on the creation of the IBM 360 was eye-opening. https://spectrum.ieee.org/building-the-system360-mainframe-n...
Having read pretty deeply on the evolution of early computers from the ENIAC era through Whirlwind, CDC, early Cray and DEC, I was familiar with the broad strokes but I never fully appreciated how much the IBM 360 was a major step change in both software and hardware architecture. It's also a dramatic story because it's rare for a decades-old company as successful and massive as IBM to take such a huge "bet the company" risk. The sheer size and breadth of the 360 effort as well as its long-term success profoundly impacted the future of computing. It's interesting seeing how architectural concepts from the 360 (as well as DEC's popular PDP-8, 10 and 11) went on to influence the design of early CPUs and microcomputers. The engineers and hobbyists creating early micros had learned computers in the late 60s and early 70s mostly on the 360s and PDPs which were ubiquitous in universities.
I encountered assembly programs written and compiled in '88 and still running.
There are several drawbacks to maintaining this kind of compatibility but, nevertheless, it's impressive.
Book recommendation: in-depth on the people, processes, and technology. Incredible detail on all aspects.
https://direct.mit.edu/books/monograph/4262/IBM-s-360-and-Ea...
Thanks for the recommendation! I've ordered it.
After reading the IEEE article I linked above, I got the book the article was based on ("IBM: The Rise and Fall and Reinvention of a Global Icon"). While it's a thorough recounting of IBM's storied history, it wasn't what I was looking for. The author specifically says his focus was not the technical details as he felt too much had been written from that perspective. Instead that book was a more traditional historian's analysis which I found kind of dry.
I don't mean this in a condescending way at all, but really out of sheer curiosity: Who uses mainframes nowadays and what for?
I find mainframes fascinating, but I'm so unfamiliar with them that I don't know what or why I'd ever use one for (as opposed to "traditional" hardware or cloud services).
Besides all answers given already, one of the reasons Unisys keeps selling Burroughs, aka ClearPath MCP, is its security model.
ESPOL/NEWP is one of the very first systems programming languages, being safe by default, with unsafe code blocks.
The whole OS was designed with security first in mind, think Rust in 1961, thus their customers are companies that take this very seriously, not only running COBOL.
The motto is unsurpassed security.
https://www.unisys.com/product-info-sheet/ecs/clearpath-mast...
Not just security, but CPU hotplugging and resuming as if nothing happened.
I'm not sure hotpugging is still there.
At least partially - the technical introduction for the Z17 says that several items can be concurrently maintained (IBM-speak for hot-swapped). So far as major items like the processing units - maybe (still reading).
(3.4mb PDF) https://www.redbooks.ibm.com/redbooks/pdfs/sg248580.pdf
> Four Power Supply Units (PSUs) that provide power to the CPC drawer and are accessible from the rear. Loss of one PSU leaves enough power to satisfy the power requirements of the entire drawer. The PSUs can be concurrently maintained
I would think their customers would demand zero downtime. And hey - if BeOS could power down CPUs almost 30 years ago I would expect a modern mainframe to be able to do it.
All servers have hot-swap power supplies.
(I'm pretty sure BeOS never actually powered off CPUs; it just didn't schedule anything. Linux "hotplug" works the same way today.)
The Unisys Clearpath/MCP runs on Xeons, so I don't think there is CPU hot-swapping.
I don't know about physically removing a drawer, but on IBM Z, if there is a un-recoverable processor error, it will be shut down, and another spare processor brought on-line to take over, transparently.
I don't know how licensing/costs ties into the CPU/RAM spares.
> Who uses mainframes nowadays and what for?
Large institutions (corporations, governments) that have existed more than a couple decades, and have large-scale mission-critical batch processes that run on them, where the core function is relatively consistent over time. Very few, if any, new processes are automated on mainframes most of these places, and even new requirements for the processes that depend on the mainframe may be built in other systems that process data before or after the mainframe workflows, but the cost and risk of replacing the well-known, finely-tuned-by-years of ironing out misbehavior, battle-tested systems often isn't warranted without some large scale requirements change that invalidates the basic premises of the system. So, they stay around.
Thanks for that and yeah that fits with what I've found, mostly continuation of legacy, critical systems that were built on mainframes. It just seems shocking to me the amount of investments IBM still puts on developing those machines given that no one seems to want to use them anymore?
It feels like I must be missing something, or maybe just underestimating how much money is involved in this legacy business.
IBM mainframes are extremely profitable. There are ~1,000 customers who cannot migrate off mainframes and are willing to spend almost any amount to keep their apps working. Mainframe profits subsidize all of IBM's other money-losing divisions like cloud.
> customers who cannot migrate off mainframes and are willing to spend almost any amount to keep their apps working.
They all can migrate their apps off the mainframes. It's just that it's cheaper to continue paying for the machines.
given that no one seems to want to use them anymore
According to a 2024 Forrester Research report, mainframe use is as large as it's ever been, and expected to continue to grow.
Reasons include (not from the report) hardware reliability and scalability, and the ability to churn through crypto-style math in a fraction of the time/cost of cloud computing.
Report is paywalled, but someone on HN might have a free link.
"Crypto-style" here, I am guessing--but not entirely certain--from a downstream comment, is intended as cryptographic in the more general sense, and not "cryptocurrency" as "crypto"-alone is often used for these days?
Consider me skeptical. A mainframe has to be the least cost effective "crypto-style" math machine you could imagine.
They have a lot of cryptographic functionality built directly into hardware, and IBM is touting quantum resistant cryptography as one of the key features. You won't mine Bitcoin on one, but, if you are concerned a bad actor could get a memory or disk dump of your system and store it until quantum computers become practical, IBM says they have your back.
Those analyst reports are kind of bought and paid for by vendors BTW.
All these legacy answers don't really make sense for this Z17... it's a new mainframe supporting up to 64T of memory and specialized cores for matrix math/AI applications. I have a hard time believing that legacy workloads are calling for this kind of hardware.
It also has a ton of high availability features and modularity that _does_ fit with legacy workloads and mainframe expectations, so I'm a little unclear who wants to pay for both sets of features in the same package.
I've heard that the AI features are used by banks for fraud detection. I guess some banks are also growing their transaction volume.
I agree that many mainframe workloads are probably not growing so what used to require a whole machine probably fits in a few cores today.
You won't see mainframes doing AI training, but there is a lot of value in being able to do extremely low-latency inference (which is why they have their NPU on the same chip as the CPUs, just a few clock-cycles from the cores) during on-line transaction processing, and less timing-critical inference work on the dedicated cards (which are a few more nanoseconds away).
Additionally, IBM marketing likes the implication that mainframe CPUs are 'the best'. If you can buy a phone with an AI processor, it only makes sense that your mainframe must have one too. (And IBM will probably charge you to use it.)
If I were a bank, I'd order one of those and put all the financial modelling and prediction load on it. Like real time analysis to approve/deny loans, do projections for deciding what to do in slower moving financial products, predicting some future looking scenarios on wider markets, etc. etc. simultaneously.
That thing is dreadnought matmul machine with some serious uptime, and can crunch numbers without slowing down or losing availability.
or, possibly, you can implement a massively parallel version of WOPR/Joshua on it and let it rip scenarios for you. Just don't connect to live systems (not entirely joking, though).
P.S.: I'd name that version of the Joshua as JS2/WARP.
The funny thing is that if they spun out half of the mainframe thing into something they could compete with people might actually buy them.
Most firms have so-so software, in need of ultra reliable hardware, not everyone is google
> I don't mean this in a condescending way at all, but really out of sheer curiosity: Who uses mainframes nowadays and what for?
There's probably some minor strategic relevance here. E.g. for the government which has some workloads (research labs, etc.) that suit these systems, it's probably a decent idea not to try and migrate over to differently-shaped compute just to keep this CPU IP and its dev teams alive at IBM, to make sure that the US stays capable of whipping up high-performance CPU core designs even if Intel/AMD/Apple falter.
Those customers don't use mainframe, they use POWER. There's been a handful of POWER supercomputers in the past decade built for essentially that reason.
POWER is not uncommon in HPC, but IBMi (which is very enterprisey) is also based on POWER. You won't find IBM mainframes in HPC, but that's because HPC is not as sensitive for latency and reliably than online transaction processing, and, with mainframes, you are paying for that, not for TFLOPS.
I understand that a company I work with uses a few, and is migrating away from them.
It seems clear to me that prior to robust systems for orchestrating across multiple servers that you would install a mainframe to handle massive transactional workloads.
What I can never seem to wrap my head around is if there are still applications out there in typical business settings where a massive machine like this is still a technical requirement of applications/processes or if it's just because the costs of switching are monumental.
I'd love to understand as well!
Bank payment processing is the primary example - being able to tell if a specific transaction is or not fraudulent in less than 100 milliseconds - but there are other businesses with similar requirements. Healthcare is one of them, and fraud detection is getting a lot more sophisticated with the on-chip NPUs within the same latency constraints.
Cloud is basically an infinitely scalable mainframe. You have dedicated compute optimised for specific workloads, and you string these together in a configuration that makes sense for your requirements.
If you understand the benefits of cloud over generic x86 compute, then you understand mainframes.
Cloud is mainframes gone full circle.
> Cloud is mainframes gone full circle.
Except that now you need to develop the software that gives mainframes their famed reliability yourself. The applications are very different: software developed for cloud always needs to know that part of the system might become unavailable and work around that. A lot of the stack, from the cluster management ensuring a failed node gets replaced and the processes running on them are spun up on another node, all the way up to your code that retries failed operations, needs to be there if you aim for highly reliable apps. With mainframes, you just pretend the computer is perfect and never fails (some exaggeration here, but not much).
Also, reliability is just one aspect - another impressive feature is their observability features. Mainframes used to be the cloud back then and you can trace resource usage with exquisite detail, because we used to bill clients by CPU cycle. Add to that the hardware reliability features built-in (for instance, IBM mainframes have memory in RAID-like arrays).
But latency
The cache design in the Z is so different from cloud computing for collaborative job processes.
Pretty much every fortune 500 that's been around for more than 30 years. Batch processing primarily - from closing their books to order processing, differs by company. But if you ask the right person, they'll tell you where it's at.
Fortune 500? More like Fortune 50000 (ok, exaggeration). But there are so many banks in the world, and their automation can run back to the 1950s. They are only slowly moving away from mainframes, if only because a rewrite of a complex system that nobody understands is tough, and possibly very costly if it is the key to billions of euros/dollars/...
That's all true, but these machines often run java code. That's something to contemplate.
IBM prices processors differently for running COBOL and Java - if you run mostly Java code, your monthly licensing fees will be vastly different. On boot, the service elements (their BMCs - on modern machines they are x86 boxes running Linux) loads microcode in accordance to a saved configuration - some CPUs will run z/OS, some will run z/OS and COBOL apps, some will run Java, some will run z/VM, some will run Linux. This is all configured on the computer whose job is to bring up the big metal (first the power, then the cooling, and only then, the compute). Under everything on the mainframe side is the PR/SM hypervisor, which is, IIRC, what manages LPARS (logical partitions, completely isolated environments sharing the same hardware). The cheapest licensing is Linux under a custom z/VM (they aren't called z but LinuxONE), and the most expensive is the ability to run z/OS and COBOL code. Running Java under z/OS is somewhat cheaper. Last time I saw it, it was very complicated.
From what I've read, 70% of the Fortune 500 do.
Here's a brochure that might be useful to read:
https://www.ibm.com/downloads/documents/us-en/107a02e95d48f8...
It's an IBM brochure, so naturally it's pumping mainframes, but it still has lots of interesting facts in it.
Everyone's predisposed that "mainframes are obsolete", but why not use a mainframe?
I mean, no one except for banks can afford one, let alone make back on opex or capex, and so we all resort to MySQL on Linux, but isn't the cost the only problem with them?
> no one except for banks can afford one
Banks smaller than the big ~5 in the US cannot afford anything when it comes to IT infrastructure.
I am not aware of a single state/regional bank that wants to have their IBM on premise anymore - at any cost. Most of these customers go through multiple layers of 3rd party indirection and pay one of ~3 gigantic banking service vendors for access to hosted core services on top of IBM.
Despite the wildly ramping complexity of working with 3rd parties, banks still universally prefer this over the idea of rewriting the core software for commodity systems.
> I mean, no one except for banks can afford one,
A Rockhopper 4 Express, a z16 without z/OS support (running Linux) was in the mid 6 digits. It's small enough to co-locate on a rack with storage nodes. While z/OS will want IBM storage, Linux is much less picky.
IBM won't release the numbers, but I am sure it can host more workloads in the same chassis than the average 6-digit Dell or Lenovo.
Density is also bad. You spend 4 full racks and get 208 cores. Sure, they might be the fastest cores around, but that gets you only so far when even off-the-shelf Dell server has 2x128-192 cores in 1U server. Similarly 64 TB of RAM is a lot, but that same Dell can have 3 TB of RAM. If I'm reading the specs correctly (they are bit confusing), z17 has only 25G networking (48 ports); the Dell I'm checking can have 8x200G network ports and can also do 400G networking. So the single 1U commodity server has more network bandwidth than the entire 4 rack z system.
Sure, there will be lot of overhead in having tens-hundreds of servers vs single system, but for lots of workloads it is manageable and certainly worth the tradeoff.
> Dell server has 2x128-192 cores in 1U server.
Can you replace 25% of your cores without stopping the machine?
> that same Dell can have 3 TB of RAM.
How does it deal with a faulty memory module? Or two? Does it notice the issue before a process crashes?
> z17 has only 25G networking
They have up to 12 IO drawers with 20 slots each. I think the 48 ports you got are on the built-in switch.
The realiability of a mainframe surpasses a Dell server by a huge gap.
Not the only problem with them. Not as easy to find skilled staff to operate them. Also, you become completely dependent on IBM (not fully terrible -- it's a single throat to choke when things go wrong).
It's hard to choke someone's throat when they already hold you by the balls.
>Who uses mainframes nowadays and what for?
Do you have a credit card? Do you bank in the USA? If you answered "yes" to either of the above questions, you interact indirectly with a mainframe.
By that, do you mean banks, payment networks or both? And I guess I'd be curious as to why mainframes versus the rest. It seems like the answer for "why" is mainly because it started on mainframes and the cost of switching is really high, but I wonder if there isn't more to it.
Edit: Oh yeah, just saw MasterCard has some job posting for IBM Mainframe/COBOL positions. Fascinating.
Both. Mainframes though are incredibly good for both I/O and uptime.
Yeah, Linux/Unix are way better on both than they used to be, but on a mainframe, it's just a totally different level.
You can run Linux on mainframes fine. RHEL has first-class support for s390x / Z.
Not just that. Most operating systems lie about when an IO transaction completes for performance reasons. So if you lose power or the IO device dies you still think it succeeded. A mainframe doesn't do that... it also multiplexes the IO so it happens more than once so if one adapter fails it keeps going. The resiliency is the main use case in many situations. That said IME 99.995% of use cases don't need a mainframe. They just don't need to be that reliable if they can fail gracefully.
Does IBM mainframes still have the pricing model where you "buy" hardware and then still pay IBM for main processing?
(Where you can save money buying Linux or Java accelerators to run things on for free
I think what you are referring to is the "sub capacity" pricing model wherein a rolling average of resource consumption is used for billing. They've transitioned to newer models circa cloud technology, but it's mostly the same idea with more moving parts.
The advantage of this model from a business operations standpoint is that you don't have to think about a single piece of hardware related to the mainframe. IBM will come out automagically and fix issues for you when the mainframe phones home about a problem. A majority of the system architecture is designed on purpose to enable seamless servicing of the machine without impacting operations.
https://www.ibm.com/z/pricing
> IBM will come out automagically and fix issues for you when the mainframe phones home about a problem. A majority of the system architecture is designed on purpose to enable seamless servicing of the machine without impacting
I'd rather have a fault-tolerant distributed software system running on commodity hardware, that way there's a plurality of hardware and support vendors to choose from. No lock-in.
> I'd rather have a fault-tolerant distributed software system running on commodity hardware, that way there's a plurality of hardware and support vendors to choose from. No lock-in.
But then you'd have to develop it yourself. IBM has been doing just that for 60 years (on the 360 and its descendants).
> distributed software system
What if the business demands a certain level of serialized transaction throughput that is incompatible with ideas like paxos?
You will never beat one fast machine at a serialized narrative, and it just so happens that most serious businesses require these semantics.
How much does downtime cost you per hour? What are the consequences if your services become unavailable?
That's like toy drone company trying to compete with DARPA. Not even close.
These kinds of monsters run under critical environments such as airports, with AS400 or similar terminals being used by secretaries. These kind of workloads, reliability, security, testing, are no joke. At all. This is not your general purpose Unix machine.
How much does it cost? I'm just curious. No, I don't want to book a meeting to "discuss" it.
I haven't worked with mainframes since the z10, but back then you could get into an entry model for about $100k.
Though the sky is the limit. The typical machine I would order had a list price of about 1 million. Of course no one pays list. Discounts can be pretty substantial depending on how much business you do with IBM or how bad they want to get your business.
The big problem is that everything in IBM-z world is negotiated, and often covered by NDAs. The pricing is complicated by which operating systems and what sort of workloads you'll be running, and what service level guarantees you need. The only published pricing in the entire life of the IBM 360/370/390/z-series line was the Linux One when it was first released... Hardware plus OS, excluding storage, was $70k on the low end.
Previous generation machines that came off-lease used to be listed on IBM's web site. You could have a fully-maxed-out previous-generation machine for under $250k. Fifteen years ago I was able to get ballpark pricing for a fully-maxed-out new machine, and it was "over a million, but less than two million, and closer to the low end". That being said, the machines are often leased.
If you go with z/vm or z/vse, the OS and softare is typically sold under terms that are pretty much like normal software, except it varies depending on the capacity level of the machine, which may be less than the physical number of CPUs in the machine, since that is a thing in IBM-land.
If you go for z/os, welcome to the world of metered billing. You're looking at tens of thousands of dollars in MRC just to get started, and if you're running the exact wrong mix of everything, you'll be spending millions just on software each month. There's a whole industry that revolves around managing these expenses. Still less complicated than The Cloud.
You can get a software emulator for free and run it on a PC! It's quite robust and used by IBM's own developers. https://en.wikipedia.org/wiki/Hercules_(emulator)
Hercules is _not_ used by IBM's own developers. Being found with Hercules on your computer at IBM gets you in trouble. I know people who work on mainframe-related stuff inside IBM and they steer well clear of Hercules. And I've heard that IBM's computer monitoring stuff (antivirus, asset protection, etc.) looks for Hercules and flags it.
But IBM _does_ have their own mainframe emulator, zPDT (z Personal Development Tool), sold to their customers for dev and testing (under the name zD&T -- z Development and Test), and to ISVs under their ISV program. That's what IBM's own developers would be using if they're doing stuff under emulation instead of LPARs on real hardware.
(And IBM's emulator is significantly faster than Hercules, FWIW, but overall less feature-full and lacks all of the support Hercules has for older architectures, more device types, etc.)
> looks for Hercules and flags it.
There was some of a legal fight between IBM and Turbo Hercules SSA, a company that tried to force IBM to license z/OS to their users. IBM has been holding a grudge ever since (probably at the advice of their legal).
You can run the emulator but you will not get your hands on new versions of the operating system to run on it. But there are old versions that you can get your hand on.
You might be able to get your hands on a recent z/OS, but IBM will not be pleased.
You don't buy a mainframe, it's consumption based pricing. They aren't just going to list a price, because they need to size the hardware to what they think the workload will be.
Could they just list prices? Sure. Will they ever do it? No.
A Rockhopper 4 Express starts at $135,000. While technically a mainframe, it won't run z/OS.
It depends on how full those drawers are. $250k to $1m would be the typical price range.
It's easier and harder at the same time to buy older hardware. That's half the challenge though because the software is strictly licensed and you pay per MIPS.
Here's a kid who bought a mainframe and then brought it up:
https://www.youtube.com/watch?v=45X4VP8CGtk
If you have to ask, you can't afford it!
I think leasing is more common.
It's probably impossible to say because of the service contracts that come with it. Nobody would buy one brand new and not pay for support and consulting too.
Accounting is a key aspect as well. A lot of would-be capex that turns into opex that way.
I'm completely fascinated by the diagram. In a four rack system, 2.5 rack is dedicated to I/O, half a rack is just empty and the remaining is the actual processing and memory.
The I/O probably isn't endless networking adaptors, so what is it?
https://www.redbooks.ibm.com/abstracts/sg248579.html:
“The IBM z17 supports a PCIe I/O infrastructure. PCIe features are installed in PCIe+ I/O drawers. Up to 12 I/O drawers per IBM z17 can be ordered, which allows for up to 192 PCIe I/O and special purpose features.
For a four CPC drawer system, up to 48 PCIe+ fan-out slots can be populated with fan-out cards for data communications between the CPC drawers and the I/O infrastructure, and for coupling. The multiple channel subsystem (CSS) architecture allows up to six CSSs, each with 256 channels.
The IBM z17 implements PCIe Generation 5 (PCIe+ Gen5), which is used to connect the PCIe Generation 4 (PCIe+ Gen4) dual port fan-out features in the CPC drawers. The I/O infrastructure is designed to reduce processor usage and I/O latency, and provide increased throughput and availability.”
There's also the problem in that they need to take into account floor loading. They're not going to tell a customer upgrading from an older machine to a new one that, "oh, by the way, the rack weighs twice what it used to, so you'll need to upgrade your floor while you're at it." Especially important for raised floors.
Probably channels. In an IBM mainframe, each I/O device is connected on its own channel, which is actually a separate computer that handles the transfer to/from the main CPU. This has been the case going back to the System/360, which is why mainframes are legendary for their transaction throughput and reliability. There's probably a lot of redundancy in the I/O hardware, as they have to be rock solid and hot swappable while the system is running.
Could be storage, networking, crypto HSM, or cluster interconnect. See page 28 on https://www.redbooks.ibm.com/redbooks/pdfs/sg248580.pdf
I always enjoy reading those IBM Redbooks and learning about the technical details of these mainframe systems.
Reading about mainframes feels very much like reading science fiction. Truly awesome technology that exists on a completely different plane of computing than anything else.
They elevate hardware design to a fine art - everything is carefully balanced.
z Systems have always been amazing engineering feats. Too bad adopting it comes with a gargantuan amount of... IBM.
Yeah a few years ago there was a Talos (I think) desktop motherboard that had a POWER 8 cpu in it. It was expensive due to low production runs but I wish it had taken off. I think IBM is up to power 9 now, but I doubt if there are any personal motherboards for it.
> IBM is up to power 9 now, but I doubt if there are any personal motherboards for it.
The Talos II:
https://wiki.raptorcs.com/wiki/Talos_II
> EATX form factor > Two POWER9-compatible CPU sockets accepting 4-/8-/18- or 22-core Sforza CPUs
"Entry" level is $5,800 USD.
There won't be a POWER10 version from them because of proprietary bits required
https://www.talospace.com/2023/10/the-next-raptor-openpower-...
> POWER10, however, contained closed firmware for the off-chip OMI DRAM bridge and on-chip PPE I/O processor, which meant that the principled team at Raptor resolutely said no to building POWER10 workstations, even though they wanted to.
https://www.osnews.com/story/137555/ibm-hints-at-power11-hop...
What are some of the reasons to buy or use these over Intel or AMD?
I bought mine - I own two Talos systems and a Blackbird - because I want choices in ISA, and if people want that, they'll need to spend the money. (It helps that I'm a notorious pro-PowerPC bigot, having used POWER since the RS/6000 MicroChannel days.) Other than ARM, they're the only architecture with both general OS support and processing power in the same ballpark, and while IBM may sometimes be dunderheaded, they aren't going anywhere, either.
They aren't cheap and they aren't for everyone. But it meets my needs and it puts my money where my mouth is.
If I understand you correctly it means the primary use case is security applications that need transparency at all levels on a system that is roughly equivalent to mainstream platform performance features. Is that accurate?
As I see it, these systems have two potential markets (some natural overlap exists): the first being people like me who don't want to necessarily feed the x86-64 beast if they can avoid it, and the second indeed being people who want a completely auditable, transparent platform. In both cases to be a practical alternative the computer needs to have similar performance to what you'd get from a modern desktop, and while POWER9 is no longer cutting edge, it still generally delivers.
If I could afford one I would, not because of security, but just the geek in me finds it cool.
Back in the 90s and early 2000s, there were several non-x86 architectures that were more powerful, and even 64 bit long before Intel ever did. The DEC alpha, SPARC, and others. I was also too poor to afford those back then but I remember them fondly.
> even 64 bit long before Intel ever did.
At some point I was reading e-mail on a 64-bit SGI machine while we waited for the Dell the company ordered for me to arrive.
The day it came in was one of the saddest days of my life.
I believe Intel and AMD motherboards all have proprietary firmware and the Talos systems are puritanically open. (But the processor itself is closed so there's that. Could have gone with SPARC Niagara which was open sourced https://www.cnet.com/tech/computing/sun-open-sources-second-... )
> But the processor itself is closed
The microarch is closed and IBM-specific. However, the ISA is open and royalty-free, and the on-chip firmware is open source and you can build it yourself. In this sense it's at least as open as, say, many RISC-V implementations.
The Sun Niagara 2 even has the Verilog RTL available. That is several orders of magnitude more open than the IBM.
> Verilog RTL for OpenSPARC T2 design
> Verification environment for OpenSPARC T2
> Diagnostics tests for OpenSPARC T2
> Scripts and Sun internal tools needed to simulate the design and to do synthesis of the design
> Open source tools needed to simulate the design
https://www.oracle.com/servers/technologies/opensparc-t2-pag...
Unfortunately, going from there to an affordable chip with reasonable performance doesn't seem possible.
That's Raptor Computing Systems (www.raptorcs.com) now selling Talos II workstations.
Being somewhat pedantic, Power and z are completely different architectures.
Completely? Isn't Z just microcoded on top of POWER under the hood?
No. The microarchitectures have some notable similarities and cross-pollinate each other, but they are distinct.
You may be thinking of IBM i (formerly known as AS/400 and i5), which has a completely abstracted instruction set that on modern systems is internally recompiled to Power.
No, Power and Telum processors are very different internally.
I dunno, but the z-processors and the POWER processors look a lot different even from a floor plan / die shot perspective. The former also clock much higher. Doesn't smell like microcode to me.
To be fair, AWS copied the concept of mainframe with x86 and "commodity" hardware, including the consumption based billing.
They forgot to copy the 99.99999% availability. :-/
I really don't understand why in 4 racks you can have only 4 CPU drawers and 12 I/O drawers. This seems like their IO is incredibly inefficient.
Have you seen the size of those drawers? A single rack can only fit five of them, and you still would need to add processing, power/UPS, and cooling.
The CPU drawers are 5U while the I/O drawers are 8U. https://higherlogicdownload.s3.amazonaws.com/IMWUC/UploadedI...