About
Pluralis is a research lab focused on collectively-owned AI.
Closed models capture enormous value but lead to an unacceptable concentration of power. Open-weight models distribute power but have massive headwinds to being financially sustainable. Our work is on a third path; collective, community driven training that is self-sustaining. Our team came from Google, Anthropic, and Amazon, where we worked together for many years prior to Pluralis. We publish openly.
We are currently carrying out open, multi-participant training runs; you can find information about previous runs here; the current run here, and can apply to join in the planning and development of future runs here.
Research
Factored Gossip DiLoCo: Reducing Blocking Communication within DiLoCo
ICML 2026C. Koneputugodage, T. Ajanthan, S. Ramasinghe, H. Dolatabadi, S. Siriwardhana, G. Avraham, V. Shevchenko, K. Pajak, J. Snewin, A. Long
We relax DiLoCo’s exact outer synchronization to approximate synchronization via mixing and gossip, factorizing it into a non-blocking step that overlaps computation with no staleness and a blocking step that tightens worker agreement. On billion-parameter language models in low-bandwidth settings, the method substantially improves compute utilization while matching DiLoCo’s training progress and is more robust to failures.
S. Ramasinghe, T. Ajanthan, H. Dolatabadi, C. Koneputugodage, G. Avraham, V. Shevchenko, Y. Zuo, K. Pajak, A. Long
We introduce a fast online curvature estimator that tracks preconditioned Hessian behavior during billion-parameter Transformer training. It reveals depth-driven curvature surges behind loss spikes and motivates architecture warm-up: progressively growing depth to stabilize training without slowing convergence.
S. Ramasinghe, T. Ajanthan, G. Avraham, Y. Zuo, A. Long
This work demonstrates that model-parallel training over low-bandwidth networks is possible, training an 8B LLaMA model on par with centralized training while transformer blocks are split across four locations connected only by standard internet links.
T. Ajanthan, S. Ramasinghe, Y. Zuo, G. Avraham, A. Long
Pipeline parallelism trains large models by splitting them into stages, but idle “bubbles” slow training, especially when network latency is high. Our Nesterov method corrects stale updates and outperforms existing async techniques and the synchronous baseline.
A. Long*, C. Koneputugodage*, S. Ramasinghe, T. Ajanthan, G. Avraham, Y. Zuo
UPMs enable collaborative training and inference without ever materializing the full model weights for any participant, making decentralized models unextractable in practice.
S. Ramasinghe, T. Ajanthan, H. Dolatabadi, G. Avraham, V. Shevchenko, Y. Zuo, C. Koneputugodage, A. Long
We introduce a compression method for communication-efficient context parallelism that achieves over 95 % compression with negligible overhead and no convergence loss. By exploiting low-rank activation structure through learned mixtures of subspaces, it scales billion-parameter decentralized models to 100 K+ context lengths on 300 Mbps networks while matching centralized wall-clock convergence.