PLURALIS RESEARCH

About

Pluralis Research works on Protocol Learning: multi-participant training of foundation models where no single participant has, or can ever obtain, a full copy of the model. Model weights are split across devices and are created, used, and modified only within the protocol.

This approach makes the model unextractable; no one can take a model and use it externally. Models become protocol assets, owned collectively by the training participants. This allows programmatic value flow from model revenue and enables collaborative ownership of the models. In turn, this makes possible an economically sustainable foundation model supply chain outside of corporate and governmental control. It allows genuine open innovation at the model layer and opens a path to potentially unprecedented scale.

Today's open-source AI does not provide any of these properties; it is dependent on some entity somewhere spending huge amounts to train a model and releasing it for free. Meanwhile, a societal-wide platform dependency on the centralized model providers is emerging as they become increasingly critical to everyday life. We believe it’s dangerous for technology that significantly shapes people’s decision-making and worldviews to be developed in closed, corporate settings.

Research

Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism

S. Ramasinghe, T. Ajanthan G. Avraham, Y. Zuo, A. Long  |  arXiv:2506.01260, 2025

This is the first work that shows model-parallel training over low-bandwidth networks is possible. Specifically, it demonstrates an 8B LLaMA model being trained on par with centralized training when the devices holding subsequent transformer blocks are in four different locations and connected only via standard internet connections. This was considered completely impossible prior to this work.

Nesterov Method for Asynchronous Pipeline Parallel Optimization

T. Ajanthan, S. Ramasinghe, Y. Zuo, G. Avraham, A. Long  |  ICML 2025

Pipeline Parallelism allows large models to train across many small devices by slicing the network into stages. In pipeline parallelism, there is a problem of a “bubble” where devices are idle. It slows down both centralized and decentralized training, but the effect is more pronounced in the decentralized case as communication lag affects the size of the bubble. We solve this, outperforming all existing async techniques and even the synchronous baseline.

▶︎ Code

Team

Media