AI@Home: New Hardware – Pixels, Peaks, and Processors

As written in this post I preordered a Framework Desktop. I’m about to cancel this, and here is why.

For the past few months, I’ve been deeply involved in experimenting with Large Language Models (LLMs), both running them locally and exploring different hardware options. I was extremely excited about the Framework Desktop. The promise of a lots of shared memory, modularity, a very capable APU (CPU & GPU), limited but given extensibility and repairability and complete absence of affordable solutions from Nvidia resonated deeply with my values. It seems to me like a very good compromise on price and value – and it definitely is. However, after careful consideration – and a particularly revealing Twitter post – I ultimately decided on a Mac Studio with the M4 Max chip.

Let’s be clear: the Framework Desktop is a fantastic piece of engineering and I still would love to play with it – not only in terms of playing with AI but also literally playing games. I also like the ethos behind it and the commitment to empowering users. For general purpose computing, it’s a compelling option.

The Turning Point: Gemma 3 27B & Token Throughput

The deciding factor came down to performance, specifically when running LLMs. Framework recently showcased the Gemma 3 27b model running on their desktop, and it sparked a direct comparison I couldn’t ignore. In this X post, the Framework Desktop delivered 9.33 tokens/second when asked to describe a given photo. So far, this has been the first indication of performance I have seen so far and Framework also correctly put the disclaimer, this is not optimized. Also, Kudos to Framework for being open and honest on this!

Nevertheless, I did the very same test as demonstrated on the Framework Desktop on a new Macbook Pro with an M4 Max (16/40/16) Processor and 48GB of RAM. The Macbook delivered 20.91 tokens/second – more than twice compared to the Framework Desktop. This result was somewhat striking.

For those unfamiliar, tokens per second is a key metric for LLM performance. Higher tokens per second mean faster response times, smoother interactions, and a more fluid experience when generating text. In my view 20 tokens/second is beginning to be usable, so this difference wasn’t marginal; it was significant enough to impact my workflow and the feasibility of certain experiments.

In the end I decided to go for an Apple Mac Studio with 128GB RAM and 1TB Drive. Here’s a breakdown of my decision-making process:

Pro: Mac Studio with M4 Max

Raw Performance: The M4 Max’s higher memory bandwidth is a clear advantage when dealing with the massive datasets and complex calculations involved in running LLMs. This translates directly into faster processing and better performance.
Immediate Availability: I wanted a machine now. The Mac Studio was readily available, while the Framework Desktop will be available no earlier than Q3 2025.
Resale Value: Apple products generally hold their value well. I anticipate a stable resale price down the line, which is a practical consideration.
Aesthetics: Let’s be honest, the Mac Studio is a beautifully designed machine. While subjective, it’s a nice bonus.

Contra: Mac Studio with M4 Max

Price: The Mac Studio is significantly more expensive than a comparable Framework Desktop configuration. This was a major consideration, and I weighed the cost against the performance gains. However, due to my son being a student I was able to take advantege of education discount. But still, pricing hurts very badly.
No Gaming: Actually I also like to play a game here and there. So this would have really nice to get from the Framework. But availability and memory bandwidth/performance for LLMs were more important.

Why Memory Bandwidth Matters for LLMs

LLMs are incredibly memory-intensive. They require constant access to vast amounts of data. The M4 Max’s superior memory bandwidth of 546 GB/s allows it to feed data to the chip much faster, minimizing bottlenecks and maximizing performance. This is where the Framework Desktop, while capable, fell short with “only” 256GB/s. And this directly materializes when comparing both for LLMs. To give you an impression what 22 tokens/second mean, have a look below.

AI@Home, Part 2: Open WebUI

Bytobias 19. February 202528. February 2025

In the first part we looked at Ollama. Now let’s dig a bit deeper into Open WebUI which is the piece that glues everything together. Open WebUI is an open-source, self-hosted interface designed to facilitate interaction with various Large Language Models (LLMs). It provides users the ability to run LLMs locally on their own servers…

AI | Homelab

AI@Home, Part 1: Overview and Ollama

Bytobias 18. February 202528. February 2025

Starting with this I will describe my home AI setup. It consists of several pieces: As a sneak peak, have a look at the below video snippet. It shows prompting (“generate a prompt for … “) in Open WebUI and image generation based on the given response. Let’s start to dig into the various components…

AI | Homelab

AI@Home, Part 4: Expose services (Open WebUI) to the internet

Bytobias 25. March 202525. March 2025

In this part, I will explain why and how I made Open WebUI available publicly on the internet. First things, first. Why? Because it is possible and it allows me to use the services while being not at home (e.g connecting it to Obsidian or VSCode). I would also consider this as quite safe, as…

AI | Homelab

AI@Home, Part 3: Generating Images with AI

Bytobias 28. February 202528. February 2025

Image generation using large language models like Midjourney, Dall-E, and Adobe Firefly has become immensely popular. This can also be achieved with certain software running on a local system. There are two options for this: Automatic1111 Stable Diffusion Web UI and ComfyUI. Both are user interfaces designed to leverage various available image generation models. For…

AI | Homelab

Framework Desktop – An alternative to NVidia to run AI@Home?

Bytobias 2. March 20255. March 2025

Recently I came to know about a new processor from AMD, called AMD Ryzen Ryzen AI Max+ 395. I saw the first post of this, as I’m following the subreddit r/LocalLLaMA about local large language models. This Reddit post discusses an AMD Ryzen AI Max+ 395 chip being 2.2 times faster than an NVIDIA 4090…

Similar Posts

Leave a Reply Cancel reply