Apple M1 for Dummies, for Dummies

Why ARM is the future of semiconductors

I came across this incredible article that explained in layman’s terms why Apple’s new M1 semiconductor chips was so much more superior than Intel or AMD’s top-tier chips:

The article does an excellent job of dumbing down the technical intricacies behind why Apple managed to make a superior class of semiconductor chip versus its existing competition. But it’s still pretty long - which just goes to show how complex the technology underlying this industry is. So I’ve done you time-strapped guys a favor, and dumbed down the already ‘dumbed-down’ version even further. Hence the title of this article, ‘Apple M1 for Dummies, for Dummies’.

Caveat: This is a dumbed-down version of the already dumbed-down explanation, so it will understandably not be totally comprehensive. I’m just trying to highlight the ultra-critical details. For further detail, I recommend reading the actual thing (also linked above).

Why is the Apple M1 better than the existing competition (i.e. Intel/AMD)?

Most laymen to the semiconductor industry are conditioned to think that more powerful chips are entirely due to higher transistor density, i.e. smaller transistors on the chips. Indeed, that’s been the way computers have achieved faster speeds since the birth of the semiconductor industry. However, things have quite recently changed.

For one, we are reaching the limits of how small transistors can become. To simplify, the transistors on semiconductor chips send computer instructions using electricity - so the more transistors you can cram on a chip, the more instructions you can send, and the faster your computer can become. However, we have now reached the point where the transistors are so small that they are almost as small as electrons (electricity is basically moving electrons); and the whole system starts to break down if the transistors get any smaller. So legacy semiconductor chips can’t really improve their performance by leaps and bounds anymore - at least not by traditional standards.

The other thing to understand, is that putting smaller transistors on a chip requires better and more expensive machines to build them. Indeed, these semiconductor factories (know as ‘foundries’) are so prohibitively expensive to build, that right now only 2 companies (or 3, depending on who you ask) are capable of producing the smallest 5 nanometer (‘nm’) chips. These are Samsung and TSMC, respectively. Intel is the potential 3rd, but they’re still some ways off from achieving the smallest 5nm capabilities.

Hence, when people try to understand why Apple’s new M1 chip outclasses existing Intel and AMD top-tier chips, they naturally start trying to understand it from the perspective of how Apple’s M1 managed to achieve even smaller transistor sizes. This would make logical sense, since Apple merely designs the M1 chip, and sends the designs to TSMC (who has the technology) to manufacture the actual physical chip.

But what the above article is trying to convey is - no. No, the Apple M1’s outperformance over existing competition is not because it has somehow managed to cram more transistors into their chip than Intel/AMD (i.e. better technology). It is because of how they approached the design of their chip (i.e. better process engineering).

Background Context

When I said that the ARM revolution was a result of better process engineering rather than better technology, what I meant was that the Apple M1 managed to do things differently in a creative way; not that they are using better technology (e.g. new EUV tech vs. old patterning tech). The Apple M1's revolutionary improvements in both performance and efficiency (usually you have to choose between one or the other) isn't because they are using more advanced EUV lithography technology like most people suspect; they're probably using the same EUV machine, but they approached the construction of the semiconductor chip in an entirely different way from the bottom-up.

To understand this, you need to have some background context. Our standard PC chips run on a computer "language" called x86 or x64 (i.e. Complex Instruction Set Computer - CISC), while mobile phone chips mostly run on the ARM language (i.e. Reduced Instruction Set Computer - RISC). I will use x86/x64 interchangeably with CISC, and likewise for ARM with RISC.

Basically, RISC consumes a lot less power at the expense of less performance, which makes it useful for mobile phones with smaller batteries, but also means they aren't as powerful as PCs. Conversely, PCs can run more complex instructions quicker but are much more power hungry, which doesn't really matter as much as phones since they either have a much larger battery or are plugged into a wall.

If you go to Youtube and search "Apple M1", you'll find a whole bunch of tech reviewers wowing over the M1's (RISC chip) ability to match the top-tier Intel/AMD chips (CISC chips) - and do it while sipping power like a phone. So not only does it have the advantages of RISC (less power consumption), it also has the advantages of CISC (higher performance).

Why It Matters From A Business Perspective

I won't go into the engineering details of why the Apple M1 is superior to everything that came before it (in terms of both performance and efficiency - read the article linked above if you want to find out), but I will talk about why it matters from a business perspective - i.e. why Intel and AMD can't just copy it. At the end of the day, business outperformance depends on being superior to your competition, not just having a superior product - so the question to ask is not why Apple is superior to Intel/AMD today, but why can't Intel/AMD just copy Apple and render its advantage nil by this time next year.

Basically, x86 and ARM are not the most fundamental (i.e. lowest) level of semiconductor "language". There is actually an even more fundamental level of "language" called "micro-operations" - which occupies an even lower level than the x86 or ARM language level. You can think of micro-ops instructions being sent up to the x86 or ARM level for the next level of instruction processing.

Why is this important? Basically, under ARM, micro-op instructions can be "chopped up" in a consistent and scalable way to be served up to the higher level language for more efficient processing at the next level. However, because of the way x86 is designed, this "chopping" technique hits a brick wall under x86. The advantage of being able to "chop" up micro-ops instructions, is that you can then run those instructions in parallel (i.e. like a GPU, for the techies). Think of it as expanding the number of lanes on a highway so that you can fit more cars at the same time, rather than having a car which travels faster. If you have enough cars running at a slower speed on many lanes in parallel, you can get more cars to the end of the road faster than if you had a very fast car and just one lane.

So basically, the reason why the Apple M1 (an RISC chip) is so much better than even the top-tier Intel/AMD chips (i.e. CISC chip) from an aggregate performance:efficency ratio standpoint, is because the former has 2x the number of lanes than the latter at the micro-ops level. So even though the M1's "cars" are slightly slower, they can serve twice the number of "cars" than the Intel/AMD chips can - which makes the M1 fly past Intel/AMD on an aggregate performance basis (i.e. performance:efficiency ratio).

And the key thing to understand from the business perspective is this - CISC chips cannot copy this advantage, because of the way they are designed. So in order for Intel/AMD to catch up, they have to redesign their entire business from scratch - a monumental task to say the least. Which means that, as new ARM competitors spring up following in the Apple M1's footsteps, they will leave the legacy CISC giants Intel/AMD in the dust - since they can't change fast enough to move the needle.

This is the definition of industry disruption, and why Intel/AMD are in an extremely precarious state today with the introduction of the Apple M1.


(Some people asked for a further explanation about why Intel or AMD can’t simply copy the Apple M1’s process advantages by throwing more money at the problem, given their deep pockets. Here’s my answer:)

The way the ARM (or RISC) “language” is designed is that all the ARM instructions are consistent (e.g. with the same starting point). This allows you to chop up one long instruction at the same interval position, send the chopped-up short instructions to multiple ARM-level processors, then group them back together into one long instruction again. Thus if you have 5 processors, you can process one long instruction as 5 short instructions at the same time, i.e. 5x faster.

In contrast, the way the x86 (or CISC) “language” is designed is that it will process one long instruction as a sequence, i.e. instruction 1a complete, instruction 1b complete, instruction 1c complete = instruction 1 complete. So you can’t exactly chop it up into instruction 1a, 1b and 1c, then send them to multiple x86-level processors and group them back together again for a complete instruction 1 (like ARM). This is because if you chop them up, the x86-level processor does not know where exactly to start processing which chopped-up part, since each chopped-up long instruction is not consistent. So CISC chips end up having to either do a lot of guesswork or use brute-force techniques if they want to apply the same trick, which will understandably slow down the process and offset any potential gains from chopping-up the instructions in the first place.

The reason why the Apple M1 (i.e. an ARM chip) is so fast, is because it is able to chop up the lowest-level instructions into more manageable bite-sized pieces and send the instructions to the next level (i.e. the ARM level), like discussed above. But the real magic is that the Apple M1 has 2x the number of “choppers” (to chop-up the long instructions into short instructions) as the top-tier Intel or AMD chips (i.e. CISC chips). So not only is it theoretically able to process one long instruction 5x faster (assuming 5x the number of ARM processors), it can also perform that improved process 2x faster – for a total performance gain of 10x. You can easily see how scaling the number of “choppers” can eventually lead to stratospheric performance gains.

On top of that, because of the way CISC chips like Intel or AMD are fundamentally designed (i.e. sequential processing), they cannot copy the same improved process that the Apple M1 has pioneered. So they can either dump all their billions of dollars of sunk costs and restart again from scratch, or be resigned to being disrupted by the new ARM revolution when new M1 copycats enter the market over the next decade. It’s a tough decision for them either way.