| |

AI@Home: New Hardware

As written in this post I preordered a Framework Desktop. I’m about to cancel this, and here is why.

For the past few months, I’ve been deeply involved in experimenting with Large Language Models (LLMs), both running them locally and exploring different hardware options. I was extremely excited about the Framework Desktop. The promise of a lots of shared memory, modularity, a very capable APU (CPU & GPU), limited but given extensibility and repairability and complete absence of affordable solutions from Nvidia resonated deeply with my values. It seems to me like a very good compromise on price and value – and it definitely is. However, after careful consideration – and a particularly revealing Twitter post – I ultimately decided on a Mac Studio with the M4 Max chip.

Let’s be clear: the Framework Desktop is a fantastic piece of engineering and I still would love to play with it – not only in terms of playing with AI but also literally playing games. I also like the ethos behind it and the commitment to empowering users. For general purpose computing, it’s a compelling option.

The Turning Point: Gemma 3 27B & Token Throughput

The deciding factor came down to performance, specifically when running LLMs. Framework recently showcased the Gemma 3 27b model running on their desktop, and it sparked a direct comparison I couldn’t ignore. In this X post, the Framework Desktop delivered 9.33 tokens/second when asked to describe a given photo. So far, this has been the first indication of performance I have seen so far and Framework also correctly put the disclaimer, this is not optimized. Also, Kudos to Framework for being open and honest on this!

Nevertheless, I did the very same test as demonstrated on the Framework Desktop on a new Macbook Pro with an M4 Max (16/40/16) Processor and 48GB of RAM. The Macbook delivered 20.91 tokens/second – more than twice compared to the Framework Desktop. This result was somewhat striking.

For those unfamiliar, tokens per second is a key metric for LLM performance. Higher tokens per second mean faster response times, smoother interactions, and a more fluid experience when generating text. In my view 20 tokens/second is beginning to be usable, so this difference wasn’t marginal; it was significant enough to impact my workflow and the feasibility of certain experiments.

In the end I decided to go for an Apple Mac Studio with 128GB RAM and 1TB Drive. Here’s a breakdown of my decision-making process:

Pro: Mac Studio with M4 Max

  • Raw Performance: The M4 Max’s higher memory bandwidth is a clear advantage when dealing with the massive datasets and complex calculations involved in running LLMs. This translates directly into faster processing and better performance.
  • Immediate Availability: I wanted a machine now. The Mac Studio was readily available, while the Framework Desktop will be available no earlier than Q3 2025.
  • Resale Value: Apple products generally hold their value well. I anticipate a stable resale price down the line, which is a practical consideration.
  • Aesthetics: Let’s be honest, the Mac Studio is a beautifully designed machine. While subjective, it’s a nice bonus.

Contra: Mac Studio with M4 Max

  • Price: The Mac Studio is significantly more expensive than a comparable Framework Desktop configuration. This was a major consideration, and I weighed the cost against the performance gains. However, due to my son being a student I was able to take advantege of education discount. But still, pricing hurts very badly.
  • No Gaming: Actually I also like to play a game here and there. So this would have really nice to get from the Framework. But availability and memory bandwidth/performance for LLMs were more important.

Why Memory Bandwidth Matters for LLMs

LLMs are incredibly memory-intensive. They require constant access to vast amounts of data. The M4 Max’s superior memory bandwidth of 546 GB/s allows it to feed data to the chip much faster, minimizing bottlenecks and maximizing performance. This is where the Framework Desktop, while capable, fell short with “only” 256GB/s. And this directly materializes when comparing both for LLMs. To give you an impression what 22 tokens/second mean, have a look below.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *