Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
When it comes to generative AI, Apple’s efforts have seemed largely concentrated on mobile — namely Apple Intelligence running on iOS 18, the latest operating system for the iPhone.
But as it turns out, the new Apple M4 computer chip — available in the new Mac Mini and Macbook Pro models announced at the end of October 2024 — is excellent hardware for running the most powerful open source foundation large language models (LLMs) yet released, including Meta’s Llama-3.1 405B, Nvidia’s Nemotron 70B, and Qwen 2.5 Coder-32B.
In fact, Alex Cheema, co-founder of Exo Labs, a startup founded in March 2024 to (in his words) “democratize access to AI” through open source multi-device computing clusters, has already done it.
As he shared on the social network X recently, the Dubai-based Cheema connected four Mac Mini M4 devices (retail value of $599.00) plus a single Macbook Pro M4 Max (retail value of $1,599.00) with Exo’s open source software to run Alibaba’s software developer-optimized LLM Qwen 2.5 Coder-32B.
After all, with the total cost of Cheema’s cluster around $5,000 retail, it is still significantly cheaper than even a single coveted NVidia H100 GPU (retail of $25,000-$30,000).
The value of running AI on local compute clusters rather than the web
While many AI consumers are used to visiting websites such as OpenAI’s ChatGPT or mobile apps that connect to the web, there are incredible cost, privacy, security, and behavioral benefits to running AI models locally on devices the user or enterprise controls and owns — without a web connection.
Cheema said Exo Labs is still working on building out its enterprise grade software offerings, but he’s aware of several companies already using Exo software to run local compute clusters for AI inferences — and believes it will spread from individuals to enterprises in the coming years. For now, anyone with coding experience can get started by visiting Exo’s Github repository (repo) and downloading the software themselves.
“The way AI is done today involves training these very large models that require immense compute power,” Cheema explained to VentureBeat in a video call interview earlier today. “You have GPU clusters costing tens of billions of dollars, all connected in a single data center with high interconnects, running six-month-long training sessions. Training large AI models is highly centralized, limited to a few companies that can afford the scale of compute required. And even after the training, running these models effectively is another centralized process.”
By contrast, Exo hopes to allow “people to own their models and control what they’re doing. If models are only running on servers in massive data centers, you lose transparency and control over what’s happening.”
Indeed, as an example, he noted that he fed his own direct and private messages into a local LLM to be able to ask it questions about those conversations, without fear of them leaking onto the open web.
“Personally, I wanted to use AI on my own messages to do things like ask, ‘Do I have any urgent messages today?’ That’s not something I want to send to a service like GPT,” he noted.
Using M4’s speed and low power …