Skip to content

Agora — Pluralis Research

Pluralis-8B Collective Run

Agora coordinates a decentralized, pipeline-parallel pre-training run for an 8B-parameter transformer. Contributor GPUs host stages of the model over the public internet; the system routes microbatches, synchronizes same-stage workers, and keeps the run moving as peers join and leave.

Agora dashboard · Quick start · Training architecture


Current run

Pluralis-8B is a collective pre-training pilot on Agora, the system that connects a consumer GPU to a collaborative training run. Each participant hosts one pipeline stage of the model. Adding more peers to a stage increases data-parallel throughput within that stage.

  • Hardware baseline: 24 GB GPU (RTX 4090, RTX 5090, RTX 6000), 80 GB RAM, 80 GB disk, 200 Mbps network
  • Region: compute instances must be located in North America; the current run's peers are NA-based, and the < 80 ms latency cap gates join eligibility
  • Platform: Linux and Windows + WSL2 (CUDA)
  • Launch path: python3 agora_cli.py
  • Multi-GPU support: run one node per GPU on the same machine
  • Swarm participation: join an ongoing run, synchronize state, then contribute compute and parameter updates

Agora has three roles in the live system: seed nodes for DHT bootstrap and routing, worker nodes for pipeline stages and same-stage synchronization, and trainer processes for microbatch routing, load balancing, and data sharding. For the per-stage view, read the Training Architecture.


Points

Every node accrues a score combining the raw pflops it processes with a baseline 1 PFLOP per hour for time spent active in the swarm. The dashboard sums scores across all peers running under one account and ranks contributors live on the public leaderboard.

More uptime and faster GPUs both translate directly to a higher rank. The full mechanics are documented in Points & Leaderboard.


Research foundations

Agora is built from several pieces of published research and engineering work.

  1. Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism. arXiv:2506.01260 · NeurIPS 2025. Subspace Networks (SSN), the architectural compressor that reduces the activation crossing each pipeline-stage boundary by up to 100×. This is the mechanism that makes WAN-grade pipeline parallelism viable.
  2. AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline Parallelism. arXiv:2601.22442. Agora uses asynchronous sparse parameter averaging from this paper: same-stage workers AllReduce 5% of their parameters every 20 local steps, with successive rounds covering non-overlapping slices, in parallel with ongoing training.
  3. Pluralis' Multi-party Training Stack. pluralis.ai/blog. The engineering write-up that integrates the individual mechanisms into a complete system.
  4. SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient. arXiv:2301.11913 · ICML 2023. The original distributed-pipeline paper Agora builds on.