Skip to content

Agora — Pluralis Research

Pluralis-8B Collective Run

A decentralized pipeline-parallel pre-training run. 8B-parameter transformer, served by contributor GPUs over the public internet.

Agora Dashboard   Agora Quick Start


What is Pluralis-8B?

Pluralis-8B is a collective pre-training pilot on Agora, the system that connects a consumer GPU to a collaborative training run. Each participant hosts one pipeline stage of the model; participants can join or leave at any time, and adding more peers to a stage increases data-parallel throughput within that stage.

  • Consumer-grade hardware (minimum): 24 GB GPU (RTX 4090, RTX 5090, RTX 6000), 80 GB RAM, 80 GB disk, 200 Mbps network
  • Region: compute instances must be located in North America (current run's peers are NA-based; the < 80 ms latency cap to them gates join eligibility)
  • Cross-platform: Linux and Windows + WSL2 (CUDA)
  • One-command launch: python3 agora_cli.py
  • Multi-GPU support: run one node per GPU on the same machine
  • Live swarm participation: join an ongoing run, synchronize state, then contribute compute and parameter updates
Discovery
Seeds
Stateless DHT bootstrap. Routing only, no model data.
Compute
Workers
Pipeline-parallel stages. Forward / backward / AllReduce. Heterogeneous GPUs.
Coordination
Trainers
CPU only. Microbatch routing, load balancing, data sharding.
Three roles, three layers. See the full Training Architecture for the per-stage view.

Earning points

Every node accrues a score combining the raw pflops it processes with a baseline 1 PFLOP per hour for time spent active in the swarm. The dashboard sums scores across all the peers running under one account and ranks contributors live on the public leaderboard.

Higher = more pflops

More uptime and faster GPUs both translate directly to a higher rank.

Read the full Points & Leaderboard guide


Research

Four published works underwrite the design of Agora.

  1. Protocol Models: Scaling Decentralized Training with Communication-Efficient Model ParallelismarXiv:2506.01260 · NeurIPS 2025. Subspace Networks (SSN), the architectural compressor that reduces the activation crossing each pipeline-stage boundary by up to 100×. The mechanism that makes WAN-grade pipeline parallelism viable.
  2. AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline ParallelismarXiv:2601.22442. Agora uses the asynchronous sparse parameter averaging from this paper: same-stage workers AllReduce 5% of their parameters every 20 local steps, with successive rounds covering non-overlapping slices, in parallel with ongoing training. Data-parallel synchronization never stalls the training loop on a full all-reduce.
  3. Pluralis' Multi-party Training Stackpluralis.ai/blog. The engineering write-up that integrates the individual mechanisms into a complete system.
  4. SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-EfficientarXiv:2301.11913 · ICML 2023. The original distributed-pipeline paper Agora builds on.