
Pluralis-8B Collective Run¶
A decentralized pipeline-parallel pre-training run. 8B-parameter transformer, served by contributor GPUs over the public internet.
Agora Dashboard Agora Quick Start
What is Pluralis-8B?¶
Pluralis-8B is a collective pre-training pilot on Agora, the system that connects a consumer GPU to a collaborative training run. Each participant hosts one pipeline stage of the model; participants can join or leave at any time, and adding more peers to a stage increases data-parallel throughput within that stage.
- Consumer-grade hardware (minimum): 24 GB GPU (RTX 4090, RTX 5090, RTX 6000), 80 GB RAM, 80 GB disk, 200 Mbps network
- Region: compute instances must be located in North America (current run's peers are NA-based; the < 80 ms latency cap to them gates join eligibility)
- Cross-platform: Linux and Windows + WSL2 (CUDA)
- One-command launch:
python3 agora_cli.py - Multi-GPU support: run one node per GPU on the same machine
- Live swarm participation: join an ongoing run, synchronize state, then contribute compute and parameter updates
Earning points¶
Every node accrues a score combining the raw pflops it processes with a baseline 1 PFLOP per hour for time spent active in the swarm. The dashboard sums scores across all the peers running under one account and ranks contributors live on the public leaderboard.
Higher = more pflops
More uptime and faster GPUs both translate directly to a higher rank.
Read the full Points & Leaderboard guide
Research¶
Four published works underwrite the design of Agora.
- Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism. arXiv:2506.01260 · NeurIPS 2025. Subspace Networks (SSN), the architectural compressor that reduces the activation crossing each pipeline-stage boundary by up to 100×. The mechanism that makes WAN-grade pipeline parallelism viable.
- AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline Parallelism. arXiv:2601.22442. Agora uses the asynchronous sparse parameter averaging from this paper: same-stage workers AllReduce 5% of their parameters every 20 local steps, with successive rounds covering non-overlapping slices, in parallel with ongoing training. Data-parallel synchronization never stalls the training loop on a full all-reduce.
- Pluralis' Multi-party Training Stack. pluralis.ai/blog. The engineering write-up that integrates the individual mechanisms into a complete system.
- SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient. arXiv:2301.11913 · ICML 2023. The original distributed-pipeline paper Agora builds on.