Exo: Transform Your Devices into a Distributed AI Cluster

Dreaming of running the largest AI models on your computer but lack the budget for a datacenter? Meet Exo, an open-source project that turns your existing devices into a distributed AI cluster, pooling their computing power to run robust open-source models like LLaMA, Mistral, and Qwen.

How Exo Works

Exo intelligently analyzes the memory and resources of each connected device, then partitions the AI model into smaller pieces distributed across the network. The more devices you add, the greater your cluster’s capacity becomes. Whether it’s your MacBook, iPhone, Android tablet, or even a Raspberry Pi, Exo can harness their combined power.

Getting Started with Exo

To set up Exo, ensure you have Python 3.12+ installed, then run the following commands:

git clone https://github.com/exo-explore/exo.git
cd exo
source install.sh

Once installed, launch the exo command on each device. Exo automatically detects other nodes in the network, eliminating complex configurations. A ChatGPT-like web interface is accessible at http://localhost:52415.

Key Features

Ring Memory Weighted Partitioning: Exo distributes model layers based on each device’s available memory.
P2P Architecture: Devices communicate seamlessly without centralized control.
ChatGPT-Compatible API: Easily integrate Exo into your existing applications.
Wide Model Support: Includes LLaMA (MLX and tinygrad), Mistral, LlaVA, Qwen, and Deepseek.

For a practical demonstration, check out NetworkChuck’s tutorial, where he tests Exo with 5 Mac Studio devices: