Pluralis Research

About

Pluralis is a research lab focused on collectively-owned AI.

Closed models capture enormous value but lead to an unacceptable concentration of power. Open-weight models distribute power but have massive headwinds to being financially sustainable. Our work is on a third path; collective, community driven training that is self-sustaining. Our team came from Google, Anthropic, and Amazon, where we worked together for many years prior to Pluralis. We publish openly.

We are currently carrying out open, multi-participant training runs; you can find information about previous runs here; the current run here, and can apply to join in the planning and development of future runs here.

Research

Factored Gossip DiLoCo: Reducing Blocking Communication within DiLoCo

ICML 2026

C. Koneputugodage, T. Ajanthan, S. Ramasinghe, H. Dolatabadi, S. Siriwardhana, G. Avraham, V. Shevchenko, K. Pajak, J. Snewin, A. Long

We relax DiLoCo’s exact outer synchronization to approximate synchronization via mixing and gossip, factorizing it into a non-blocking step that overlaps computation with no staleness and a blocking step that tightens worker agreement. On billion-parameter language models in low-bandwidth settings, the method substantially improves compute utilization while matching DiLoCo’s training progress and is more robust to failures.

Taming Curvature: Architecture Warm-up for Stable Transformer Training

ICLR 2026

S. Ramasinghe, T. Ajanthan, H. Dolatabadi, C. Koneputugodage, G. Avraham, V. Shevchenko, Y. Zuo, K. Pajak, A. Long

We introduce a fast online curvature estimator that tracks preconditioned Hessian behavior during billion-parameter Transformer training. It reveals depth-driven curvature surges behind loss spikes and motivates architecture warm-up: progressively growing depth to stabilize training without slowing convergence.

Subspace Networks: Scaling Decentralized Training with Communication-Efficient Model Parallelism

NeurIPS 2025

S. Ramasinghe, T. Ajanthan, G. Avraham, Y. Zuo, A. Long

This work demonstrates that model-parallel training over low-bandwidth networks is possible, training an 8B LLaMA model on par with centralized training while transformer blocks are split across four locations connected only by standard internet links.

Nesterov Method for Asynchronous Pipeline Parallel Optimization

ICML 2025

T. Ajanthan, S. Ramasinghe, Y. Zuo, G. Avraham, A. Long

Pipeline parallelism trains large models by splitting them into stages, but idle “bubbles” slow training, especially when network latency is high. Our Nesterov method corrects stale updates and outperforms existing async techniques and the synchronous baseline.

Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization

NeurIPS 2025

A. Long*, C. Koneputugodage*, S. Ramasinghe, T. Ajanthan, G. Avraham, Y. Zuo

UPMs enable collaborative training and inference without ever materializing the full model weights for any participant, making decentralized models unextractable in practice.

Mixtures of Subspaces for Bandwidth-Efficient Context Parallel Training

NeurIPS 2025

S. Ramasinghe, T. Ajanthan, H. Dolatabadi, G. Avraham, V. Shevchenko, Y. Zuo, C. Koneputugodage, A. Long

We introduce a compression method for communication-efficient context parallelism that achieves over 95 % compression with negligible overhead and no convergence loss. By exploiting low-rank activation structure through learned mixtures of subspaces, it scales billion-parameter decentralized models to 100 K+ context lengths on 300 Mbps networks while matching centralized wall-clock convergence.

Backed by