About
Pluralis Research works on Protocol Learning: multi-participant training of foundation models where no single participant has,
or can ever obtain, a full copy of the model. Model weights are split across devices and are created, used, and modified only
within the protocol.
This approach makes the model unextractable; no one can take a model and use it externally. Models become protocol assets, owned
collectively by the training participants. This allows programmatic value flow from model revenue and enables collaborative
ownership of the models. In turn, this makes possible an economically sustainable foundation model supply chain outside of
corporate and governmental control. It allows genuine open innovation at the model layer and opens a path to potentially
unprecedented scale.
Today's open-source AI does not provide any of these properties; it is dependent on some entity somewhere spending huge amounts
to train a model and releasing it for free.
Meanwhile, a societal-wide platform dependency on the centralized model providers is emerging as
they become increasingly critical to everyday life. We believe it’s dangerous for technology that significantly shapes
people’s decision-making and worldviews to be developed in closed, corporate settings.
Research
S. Ramasinghe, T. Ajanthan G. Avraham, Y. Zuo, A. Long | arXiv:2506.01260, 2025
This is the first work that shows model-parallel training over low-bandwidth networks is possible. Specifically, it demonstrates an 8B LLaMA model being trained on par with centralized training when the devices holding subsequent transformer blocks are in four different locations and connected only via standard internet connections. This was considered completely impossible prior to this work.
T. Ajanthan, S. Ramasinghe, Y. Zuo, G. Avraham, A. Long | ICML 2025
Pipeline Parallelism allows large models to train across many small devices by slicing the network into stages. In pipeline parallelism, there is a problem of a “bubble” where devices are idle. It slows down both centralized and decentralized training, but the effect is more pronounced in the decentralized case as communication lag affects the size of the bubble. We solve this, outperforming all existing async techniques and even the synchronous baseline.
▶︎ Code